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Preface 


THIS BOOK WAS DEVELOPED FROM A COURSE OF LECTURES ON 
sample survey techniques which I gave for a few years at North Caro- 
lina State College to students who intended to make their careers in 
the field of statistics. The purpose of the book is to present a reason- 
ably comprehensive account of sampling theory as it has been de- 
veloped for use in sample surveys, with sufficient illustrations to show 
how the theory is applied in practice, and with a supply of exercises to 
be worked by the student. My hope is that the book will be useful 
both as the basis of a course on sample survey techniques in which the 
major emphasis is on theory, and for individual reading by the student. 
who does not have access to formal instruction. 

As an indication of the level at which the book is directed, the 
minimum mathematical equipment necessary for an easy understand- 
ing of the proofs is a knowledge of calculus as far as the determination 
of maxima and minima (using Lagrange’s multipliers where required), 
plus a familiarity with elementary algebra, and especially with the use 
of summation signs. On the statistical side, the book presupposes an 
introductory course which includes such topics as combinatorial prob- 
abilities, expected values and their properties, means and standard 
deviations, the normal, binomial, and multinomial distributions, con- 
fidence limits, Student’s t-test, linear regression, and the simpler types 
of analysis of variance. Occasionally, reference is made to more ad- 
vanced statistical results, since I have tried to point out the relation 
between sample survey theory and the main stream of statistical 
theory. In the early parts of the book, each step in a proof should be 
readily apparent from the previous steps; towards the end, where the 
proofs are more condensed, most readers will find that some work with 
pencil and paper is necessary to follow the steps in detail. 

Readers with advanced training in probability may find the argu- 
ments by which theorems are established rather pedestrian. In a sense, 
sample survey theory is easy, because thus far it has dealt mainly with 
means and variances. By the use of powerful operational methods, the 
bulk of the existing theory can now, I believe, be developed in a very 
compact space as particular cases of a few general results. Such a 
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development would be illuminating in clarifying the interrelationships 
between the different parts of the subject, and might prove a stimulus 
to further research and discovery. But my experience in teaching has 
been that most students who wish to learn something about sampling 
theory find this kind of presentation heavy going, and prefer a more 
leisurely progress. 

Sampling theory and practice have both grown so much in the past 
ten years that an adequate coverage of the two aspects of sampling 
now requires a lengthy volume. Although this book is not intended to 
contain a thorough discussion of sampling practice, it does endeavor 
to show how the various topics that comprise sampling theory arise 
from problems in sampling practice. This link is essential to an under- 
standing of sample survey theory, whose primary aim is to make 
sampling practice more efficient and economical. In the same way, 
the book presents some of the recommendations about sampling prac- 
tice that follow from the results in theory. I have deliberately re- 
frained, however, from making these recommendations too specific or 
too strong. The tendency in sampling practice, where decisions must 
often be made quickly on inadequate knowledge, is to develop a series 
of working rules, each of which has some basis in theory. There is 
danger, however, that working rules which have been successful in one 
type of sampling become entrenched, so that they are relied upon in 
quite different kinds of sampling for which they are not appropriate. 
Re-examination from time to time of the theoretical basis for any pro- 
posed working rule helps to avoid this danger. 

The choice of a system of notation is a perplexing one to the writer. 
The chief problem is how to prevent an epidemic of subscripts, which 
make the results look formidable and unattractive. With multistage 
stratified sampling, several symbols are needed to remind the reader 
of the structure of the population, and, ideally, the notation adopted 
for an estimate computed from sample data should remind him not 
only of the way in which the estimate is made, but also of the way in 
which the sample is drawn. My approach has been to use capital 
letters for characteristics of the population and small letters for those 
of the sample, and to employ a consistent set of subscripts to denote 
the structure of the population. For the rest, subscripts with a mne- 
monic content have been favored, and I have not hesitated to repeat 
the definition of some notation in places where my guess is that the 
reader will have begun to forget it. Lapses from consistency occur: 
the alphabet soon becomes used up, and the letter m, for instance, is 
worked overtime. Although I hope that any inconsistencies will not 
be troublesome, the reader who is puzzled by them has my apologies 
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and sympathy; the struggle to understand a theorem without knowing 
clearly what the symbols mean is highly exasperating. 

My best thanks are due to Dr. A. L. Finkner and Dr. Emil H. Jebe, 
who prepared a large part of the mimeographed lecture notes from 
which this book was developed. Dr. Jebe made a painstaking reading 
of the present book in manuscript, and Dr. Helen Abbey also read 
parts of the manuscript. The secretarial staff and graduate students 
of the Department of Biostatistics, Johns Hopkins University, gave 
invaluable help in the preparation of manuscript and in proofreading. 
For permission to use data from surveys I am indebted to Dr. F. C. 
Cornell and Dr. Finkner. Some theoretical investigations were facili- 
tated by a research contract with the Office of Naval Research. While 
the manuscript was nearing completion, I had the advantage of read- 
ing a substantial part of the book, Sample Survey Methods and Theory, 
by M. H. Hansen, W. N. Hurwitz, and W. G. Madow, and of noting 
how these authors had handled the inevitable points at which a lucid 
exposition is hard to find. Numerous references to this fine book would 
have been made if it had appeared in print in sufficient time. 

The present book contains more material than can be covered in the 
time usually devoted to a course on sample surveys. However, the 
sections have been prepared so that many of them can be omitted, or 
condensed to a brief statement of the results, without detriment to 
later parts of the book. There are, for instance, numerous discus- 
sions of special topics, which attempt to answer questions that have 
been raised by alert sampling practitioners but which are not essen- 
tial to a firm understanding of the fundamentals of the subject. Al- 
though the selection of topics for discussion must depend on the field 
of application, the following suggestions are made of sections which 
may be omitted or condensed in an introductory course: 3.5, 3.9; 4.7; 
5.8, 5.10, 5.12, 5.16, 5.17, 5.21; 6.8, 6.10; 7.4, 7.6, 7.7, 7.8, 7.9; 8.5, 
8.6, 8.7, 8.9, 8.14; 9.5, 9.6, 9.12, 9.13; 10.4, 10.5, 10.8, 10.9; 11.9, 11.10, 
11.11, 11,12, 11.13; 12.11; 13.4. 

WILLIAM G. COCHRAN 


The Johns Hopkins University 
March, 1953 
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CHAPTER 1 n 
INTRODUCTION 


1.1 Advantages of the sampling method. Our knowledge, our atti- 
tudes, and our actions are based to a very large extent upon samples. 
This is equally true in everyday life and in scientific research. A 
person’s opinion of an institution that conducts thousands of trans- 
actions every day is often determined by the one or two encounters 
which he has had with the institution in the course of several years. 
The traveller who spends 10 days in a foreign country and then pro- 
ceeds to write a book telling the inhabitants how to revive their in- 
dustries, reform their political system, balance their budget, and im- 
prove the food in their hotels is a familiar figure of fun. But in a real 
sense he differs from the political scientist who devotes 20 years to 
living and studying in the country only in that he bases his conclusions 
on a much smaller sample of experience and is less likely to be aware 
of the extent of his ignorance. In every branch of science we lack the’ 
resources to study more than a fragment of the phenomena that might 
advance our knowledge. 

Until recent years, relatively little attention was given to the prob- 
lem of how to draw a good sample. This does not matter so long as 
the material from which we are sampling is uniform, so that any kind 
of sample gives almost the same results. Laboratory diagnoses about 
the state of our health are made from a few drops of blood. This 
procedure is based on the assumption that the circulating blood is 
always well mixed and that one drop tells the same story as another— 
an assumption which we as laymen fervently hope is correct. But 
when the material is far from uniform, as is often the case, the method 
by which the sample is obtained is critical, and the study of techniques 
that ensure a trustworthy sample becomes important. 

This book contains an account of the body of theory that has been 
built up to provide a background for good sampling methods. In 
most of the applications for which this theory was constructed, the 
aggregate about which information is desired is finite and delimited— 
the inhabitants of a town, the machines in a factory, the fish in a lake. 
In some cases it may seem feasible to obtain accurate information by 
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taking a complete enumeration or census of the aggregate. Adminis- 
trators who have been accustomed to dealing with censuses have some- 
times been suspicious of samples and reluctant to use them in place 
of censuses. Although this attitude is losing ground, it may be well 
to list the principal advantages of sampling as compared with complete 
enumeration. 

i. Reduced cost. If data are secured from only a small fraction of 
the aggregate, expenditures may be expected to be smaller than if a 
complete census is attempted. 

ii. Greater speed. For the same reason, the data can be collected 
and summarized more quickly with a sample than with a complete 
count. This may be a vital consideration when the information is 
urgently needed. 

iii. Greater scope. In certain types of inquiry, highly trained per- 
sonnel or specialized equipment, limited in availability, must be used 
to obtain the data. A complete census may then be impracticable: 
the choice lies between obtaining -the information by sampling or not 
at all. Thus surveys which rely on sampling have more scope and 
flexibility as to the types of information that can be obtained. On 
the other hand, if information is wanted for many subdivisions or 
segments of the population, it may be found that a complete enumera- 
tion offers the best solution. 

iv. Greater accuracy. Because personnel of higher quality can be 
employed and can be given intensive training, a sample may actually 
produce more accurate results than the kind of complete enumeration 
that it is feasible to take, 


12 The Principal steps in a sample survey. As a preliminary to a 
discussion of the role which theory plays in a sample survey, it is 
convenient to describe briefly the steps that are usually involved in 


nd > inhi are suspicious of a stranger, and 
very suspicious of an inquisitive stranger. Problems which are baf- 


i. Statement of the objectives of the survey. A lucid statement of 
the objectives is most helpful. Without this, it is easy in a complex 
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survey to forget the objectives when engrossed in the details of plan- 
ning, and to make decisions that are at variance with the objectives. 

ii. Definition of the population to be sampled. The word population 
will be used to denote the aggregate from which the sample is chosen. 
The definition of the population may present no problem, as when 
sampling a batch of electric light bulbs in order to estimate the aver- 
age length of life of a bulb. In sampling a population of farms, on the 
other hand, rules must be set up to define a farm, and borderline cases 
will arise. These rules must be usable in practice: the enumerator 
must be able to decide in the field, without much hesitation, whether 
a doubtful case belongs to the population or not. 

Whenever possible, the population to be sampled should obviously 
coincide with the population about which information is wanted. 
Sometimes this requirement is judged, rightly or wrongly, to be too 
difficult. In a new area of research, where the collection of data pre- 
sents perplexing problems of measurement, it may be decided to con- 
centrate the resources on this aspect of the survey, choosing a popula- 
tion that is compact and easy to sample, although this is not the broader 
population about which information is really wanted. In this event 
one should also collect any comparative information about the two 
populations that helps to show whether inferences to the broader pop- 
ulation can be attempted. 

iii. Determination of the data to be collected. It is well to verify 
that all the data are relevant to the purpose of the survey, and that 
no essential data are omitted. There is frequently a tendency to col- 
lect too many data, some of which are never subsequently examined. 

iv. Methods of measurement. When the kinds of data that are 
needed have been decided, there may be a choice as to the methods of 
Measurement to be employed. For instance, data about a person’s 
state of health may be obtained from statements which he makes or 
from a more or less thorough medical examination. With human 
Populations, the manner and the order in which questions are asked 
May produce substantial differences in the results: see e.g. Payne 
(1951). 

v. Choice of sampling unit. As a preliminary to the selection of 
a sample, the population must be subdivided in some way into parts 
which will be called sampling units, or units. The sampling units 
must together comprise the whole of the population, and they must 
be non-overlapping, in the sense that every element in the population 
belongs to one and only one unit. Sometimes the appropriate unit is 
obvious, as with a population of light bulbs, where the unit is the 
single bulb. Sometimes there is a considerable choice of unit. In 
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sampling the people in a town, the unit might be an individual per- 
son, the members of a household, or all persons dwelling in the same 
city block. In sampling an agricultural crop, the unit is likely to be 
an area of land whose shape and dimensions are at our disposal. 

The construction of a complete list of sampling units, sometimes 
called a frame, may be one of the major practical problems. Sometimes 
the frame is impossible to construct, as with the population of fish in 
alake. " 

vi. Selection of the sample. There is now a variety of procedures 
by which the sample may be selected. The selection involves also a 
decision about the size of the sample, which in turn requires a provi- 
sional estimate of the cost of the survey, to ensure that the sample 
will fall within the allowable budget. 

vii. Organization of the field work. In extensive surveys, many 
problems of business administration are involved. The personnel 
must receive training in the purpose of the survey and in the methods 
of measurement to be employed and must be adequately supervised 
in their work. A procedure for early checking of the quality of the 
returns may be invaluable. Plans must be made for handling non- 
response, that is, the failure of the enumerator to obtain information 
from certain of the units in the sample. 

viii. Summary and analysis of the data. The first step is to edit the 
completed questionnaires, in the hope of amending recording errors, 
or at least of deleting data that are obviously erroneous. Decisions 
about tabulating procedure are needed in the case where answers to 
certain questions were omitted by some respondents or had to be de- 
leted in the editing process. Thereafter, the tabulations which lead 
to the estimates are performed. Different methods of estimation may 
be available for the same data. 

ix. Information gained for future surveys. The more information 
we have initially about a population, the easier it is to devise a sam- 
ple which will give accurate estimates. Any completed sample is po- 
tentially a guide to improved future sampling, through the data which 
it supplies about the means, standard deviations, and nature of the 
variability of the principal measurements, and about the costs in- 
volved in getting the data. Sampling practice advances more rapidly 


when provisions are made to assemble and record information of this 
type. 


There is another important respect in which any completed sample 
facilitates future samples. 


Things never go exactly as planned in a 
complex survey. The alert sampler learns to recognize mistakes in 
execution and to see that they do not occur in future surveys, 
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1.3 The role of sampling theory. This list of the steps in a sample 
survey has been given in order to emphasize that sampling is a prac- 
tical business, which calls for several different types of skill. In some 
of the steps—the definition of the population, the determination of 
the data to be collected and of the methods of measurement, and the 
organization of the field work—sampling theory plays at most a minor 
role. Although these topics will not be discussed further in this boék, 
their importance should be realized. Sampling demands attention to 
all phases of the activity: poor work in one phase may ruin a survey 
in which everything else is done well. 

The purpose of sampling theory is to make sampling more efficient. 
It attempts to develop methods of sample selection and of estimation 
that provide, at the lowest possible cost, estimates that are precise 
enough for our purpose. This principle of specified precision at mini- 
mum cost recurs repeatedly in the presentation of theory. 

In order to apply this principle, we must be able to predict, for any 
sampling procedure that is under consideration, the precision and the 
cost to be expected. So far as precision is concerned, we cannot fore- 
tell exactly how large an error will be present in an estimate in any 
specific situation, for this would require a knowledge of the true value 
for the population. Instead, the precision of a sampling procedure is 
judged by examining the frequency distribution which is generated 
for the estimate, if the procedure is applied again and ‘again to the 
same population. This is, of course, the standard technique by which 
precision is judged in statistical theory. 

A further simplification is introduced. With samples of the sizes 
that are common in practice, there is often good reason to suppose 
that the sample estimates are approximately normally distributed. 
Consequently the sampling variance of the estimate is used to provide, 
in inverse terms, a measure of its precision. A considerable part of 
the theory deals with the calculation of formulas for the sampling 
variances of estimates obtained by various procedures. 

The study of sampling from an infinite population is a relatively old 
and well-established discipline. The development of theory specifically 
for application to sample surveys is quite recent. Nearly all the ref- 
erences in this book are less than 20 years old and the majority are 
less than 10 years old. The primary stimulus to sample survey theory 
was the increasing use of sample surveys as a means of obtaining infor- 
mation. Most of the work in sample survey theory has been done by 
Persons who are also actively engaged in the conduct of surveys. In 
their turn, the advances in theory increased the scope and utility of 
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the sampling method and contributed to a further growth in the prac- 
tical use of surveys.* 

One difference between sample survey theory and the older theory 
of sampling is that the populations with which we have to deal in 
survey work contain a finite number of units. The methods used to 
prove theorems are different, and the results are slightly more com- 
plicated, when sampling is from a finite instead of an infinite popula- 
tion. For practical purposes these differences in results for finite and 
infinite populations are seldom important. Whenever the size of the 
sample is small relative to the size of the population, as happens in the 
great majority of applications, results derived from an infinite popula- 
tion are fully adequate. In general, results for finite populations will 
be presented in this book. In some of the more difficult problems, the 
theory for infinite populations will be used to simplify the presentation. 


1.4 Probability sampling. All sampling procedures for which a theory 
has been developed have the following mathematical properties in 
common: 

i. We are able to define the set of distinct samples, S1, S2, +++, Sy, 
which the procedure is capable of selecting if applied to a specific pop- 
ulation. This means that we can say precisely what sampling units 
belong to S1, to S2, and so on. For example, suppose that the popula- 
tion contains six units, numbered from 1 to 6. A common procedure 
for choosing a sample of size 2 gives three possible candidates— 
Si ~ (1, 4); S2 ~ (2, 5); S3 ~ (3, 6). Note that not all possible 
samples of size 2 need be represented. 

ii. Each possible sample S; has assigned to it a known probability 
of selection r; 

iii. We select one of the S; by a process in which each S; receives its 
appropriate probability 7; of being selected. In the example we might 
assign equal probabilities to the three samples. Then the draw itself 
can be made by choosing a random number between 1 and 3. If this 
number is j, S; is the sample that is taken. 

iv. The method for computing the estimate from the sample must 
be stated and must lead to a unique estimate for any specific sample. 
We may declare, for example, that the estimate is to be the average 
of the measurements on the individual units in the sample. 

For any sampling procedure which satisfies these properties, we are 
in a position to calculate the frequency distribution of the estimates 
which it generates if repeatedly applied to the same population, for 


* Stephan (1948) gives a good historical account of the development of the uses 
of modern sampling techniques. 
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we know how frequently any particular sample S; will be selected, and 
we know exactly how to calculate the estimate from the data-in S;. 
It is clear, therefore, that we are able to develop a sampling theory 
for any procedure of this type, although the details of the develop- 
ment may be intricate. 

The term probability sampling refers to a procedure of this type. 
This term has not yet acquired a standard definition, and some writérs 
use it in a more restrictive sense. The main purpose of the term is to 
distinguish this kind of sampling from purposive selection, in which 
the sample is restricted to units thought by someone to be especially 
typical of the population or convenient for sampling. Purposive se- 
lection may produce good results when the sample is small, but it is 
not amenable to the development of a theory, because it contains no 
element of random selection. 

In practice we seldom draw a sample by writing down the S; and 
a; as outlined above. This is intolerably laborious with a large popu- 
lation, where a sampling procedure may produce billions of possible 
samples. The draw is most commonly made by specifying probabili- 
ties of inclusion for the individual units, and drawing units, one by 
one or in groups, until the sample of desired size and type is con- 
structed. For the purposes of a theory it is sufficient to know that we 
could write down the S; and 7; if we wanted to and had unlimited time. 


1.5 Bias and its effects. For simplicity, it is assumed in the presen- 
tation of theory that any measurement y; on the ¿th unit is the cor- 
rect value for that unit. Errors of measurement are ignored. This 
assumption is of course unrealistic, and in chapter 13 the effects of 
errors of measurement on the standard results are examined. For 
some types of error, the standard results remain valid with only minor 
changes. For other types of error, more drastic changes are needed. 

The effects of bias will, however, be discussed in this section, be- 
cause the deliberate use of biased estimates is often found to be profit- 
able in sample surveys. 

A sampling procedure is said to be unbiased if the mean of the fre- 
quency distribution of the estimates which it produces is exactly equal 
to the population characteristic which is being estimated. In the no~ 
tation of the previous section, let 2; be the estimate provided by the 
sample S; (i = 1, 2, =; v), with probability of selection 7;, and let 
9; be the population value which is being estimated. The procedure 
1s unbiased if 


È rizi = b; 


i=l 
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If the two quantities are not equal, their difference is called the bias 
in the sampling procedure: 


v 
Bias = J` riz; — b; 


i=l 


‘To examine the effect of bias, suppose that the estimate z is normally 
distributed about a mean m which is a distance B from the true popu- 
lation value y, as shown in figure 1.1. The amount of bias is B = 


0.4 
0.3 
0.2 


0.1 


-30 lo Q 2¢ 30 
m 


Fiaure 1.1 Effect of bias on errors of estimation. 


m — pu. Suppose that we do not know that any bias is present. We 
compute the standard deviation o of the frequency distribution of the 
estimate—this will, of course, be the standard deviation about the 
mean m of the distribution, not about the true mean u. As a state- 
ment about the accuracy of the estimate, we declare that the proba- 
bility is 0.05 that the estimate z is in error by more than 1.96c. 

We will consider how the presence of bias distorts this probability. 
To do this, we calculate the true probability that the estimate is in 
error by more than 1.96c, where error is measured from the true mean 
u. The two tails of the distribution must be examined separately. 
For the upper tail, the probability of an error of more than +1.96¢ is 
the shaded area above Q in figure 1.1. This area is given by 


1 <<) 
— M(z—m)?/o? 
= e dz 
oV 2r f +1.960 
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Put z — m = ci. The lower limit of the range of integration for 


tis 
-m 


o 
Thus the area is 


o 


B 
+ 1.96 = 1.96 — — 


Co 


2 Í ef dt 
VJ 2x J196- (8/0) 


Similarly, the lower tail, i.e. the shaded area below P, has an area 


wat 


—1.96—(B/e) 


æ% 


e-ĉh di 


From the form of the integrals it is clear that the amount of dis- 
turbance depends solely on the ratio of the bias to the standard devia- 
tion. The results are shown in table 1.1. 


TABLE 1.1 Errecr or A BIAS B ON THE PROBABILITY OF AN ERROR 


GREATER THAN 1.960 


Probability of error 


— SS 
B/o < — 1.960 >1.960 
0.02 0.0238 0.0262 
0.04 0.0228 0.0274 
0.06 0.0217 0.0287 
0.08 0.0207 0.0301 
0.10 0.0197 0.0314 
0.20 0.0154 0.0392 
0.40 0.0091 0.0594 
0.60 0.0052 0.0869 
0.80 0.0029 0.1230 
1.00 0.0015 0.1685 
1.50 0.0003 0.3228 


Total 


0.0500 
0.0502 
0.0504 
0.0508 
0.0511 
0.0546 
0.0685 
0.0921 
0.1259 
0.1700 
0.3231 


For the total probability of an error of more than 1.96c, the bias 
has little effect provided that it is less than one-tenth of the standard 
deviation. At this point the total probability is 0.0511 instead of the 
0.05 which we think it is. As the bias increases further, the disturb- 
ance becomes more serious. At B =o, the total probability of error 


is 0.17, more than three times the presumed value. 


The two tails are affected differently. With a positive bias, as in 
this example, the probability of an underestimate by more than 1.96 
shrinks rapidly from the presumed 0.025 to become negligible when 
B=, The probability of the corresponding overestimate mounts 
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steadily. In most applications the total error is the primary interest, 
but occasionally we are particularly interested in errors in one direc- 
tion. 

As a working rule, the effect of bias on the accuracy of an estimate 
is negligible if the bias is less than one-tenth of the standard deviation 
of the estimate. If we have a biased method of estimation for which 
we can show that B/c < 0.1, it can be claimed that the bias is not an 
appreciable disadvantage of the method. 

Any biased method must, however, be used with caution. Suppose 
that samples are drawn from a population every month throughout a 
year, and that monthly estimates are made by some biased method of 
estimation. The arithmetic mean of the twelve estimates is computed 
subsequently in order to obtain an average annual figure. If the pop- 
ulation is changing only slowly, it is not unlikely that the biases in 
the twelve estimates will have the same sign and be of about the same 
magnitude. The bias in the annual average is therefore about the 
same as the bias in a single monthly figure. If the monthly samples 
are drawn independently, the standard error of the annual average 
estimate will be about 1/12 times the standard error of a monthly 
estimate. Hence the ratio of the bias to the standard error in the 
annual average is roughly V12 times that in a monthly figure, and 
this inflated ratio may not be negligible. Since it is always difficult 
to foretell all the ways in which sample estimates may be averaged for 
later purposes, the use of biased estimates is to be‘avoided unless there 
is evidence that the ratio of the bias to the standard error is extremely 
small. 

Unsuspected bias may be present in an estimate, even when great 
pains have been taken to exclude bias. Since the standard deviation 
of an estimate, as obtained from a sample, does not include the con- 
tribution of the bias, it is preferable to speak of this standard devia- 
tion as measuring the precision of the estimate, rather than its ac- 
curacy. Accuracy usually refers to the size of deviations from the 
true mean y, whereas precision refers to the size of deviations from the 
mean m obtained by repeated application of the sampling procedure. 


1.6 References. 
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CHAPTER 2 
SIMPLE RANDOM SAMPLING 


2.1 Simple random sampling. Sample surveys deal with samples 
drawn from populations which contain a finite number N of units. 
If these units can all be distinguished from one another, the number 
of distinct samples of size that can be drawn from the N units is 


given by the combinatorial formula 


w 


For example, if the population contains five units denoted by A, B, 
C, D, and F, there are ten different samples of size 3, as follows: 


ABC ABD ABE ACD ACE 
ADE BCD BCE BDE CDE 


Note that the same letter is not allowed to occur twice in the sample. 
No attention is paid to the order in which the letters occur in the 
sample, the six samples ABC, ACB, BAC, BCA, CAB, and CBA be- 
ing considered identical. 

Simple random sampling is a method of selecting n units out of the 
N such that every one of the Cn samples has an equal chance of 
being chosen. This type of sampling is sometimes called random 
sampling. Since the word random is used in the literature in many 
different. senses, an extra qualifying adjective is advisable. Some 
writers prefer the phrase unrestricted random sampling. 

In practice a simple random sample is drawn unit by unit. The 
units in the population are numbered from 1 to N. A series of random 
numbers between 1 and N is then drawn, either by means of a table 
of random numbers or by placing the numbers 1 to N in a bowl and 
mixing thoroughly. If the bowl is used, n numbers are drawn out in 
succession. The units which bear these numbers constitute the sample. 
At any stage in the draw, this process gives an equal chance of selec- 
tion to all numbers not previously drawn. It is easy to verify that 


all yC, possible samples have an equal chance. 
11 
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When a number has been drawn from the bowl, it is not replaced, 
since this might allow the same unit to enter the sample’more than 
once. For this reason the sampling is described as without replacement. 
Similarly, if a table of random numbers is employed, a number that 
has been drawn previously is ignored. Sampling with replacement is 
entirely feasible, but except in special circumstances is seldom used, 
since there seems little point in having the same unit: twice in the 
sample. 

Other methods of sampling are often preferable to simple random 
sampling on the grounds of convenience or of increased precision. 
Simple random sampling serves best to introduce sampling theory. 


2.2 Definitions and notation. In a sample survey we decide upon 
certain properties which we attempt to measure and record for every 
unit that comes into the sample. These properties of the units will 
be referred to as characteristics or more simply as items. 

The values obtained for any specific item in the N units which com- 
prise the population are denoted by Yı, Y2, +++, yy. The correspond- 
ing values for the units in the sample are denoted by Yi, Yo) °°") Yn 
or, if we wish to refer to a typical sample member, by y: (¢ = 1, 2, 
+++, n). Note that the sample will not consist of the first n units in 
the population, except in the instance, usually rare, in which these 
units happen to be drawn. If this point is kept in mind, my experi- 
ence has been that no confusion need result. 

Capital letters will refer to characteristics of the population, and 


lower case letters to those of the sample. For totals and means we 
have the following definitions: 


Population value Sample value 


Total: Y = yi +yz + yy y=n tmt + Yn (2.1) 


afi tnter Y gattu ttn y 
Mean: [ge me = =e (2.2) 


One unusual feature of this notation is the use of the symbol y to 
denote the sample total of the values yi. In statistical literature, y 
Serves as a general symbol for the variate itself, as in the phrase the 
frequency distribution of y. Instead, we shall refer to the frequency 
distribution of y;, reserving the symbol y for the sample total. 

Although sampling is undertaken for many different Purposes, in- 
terest centers most frequently on three characteristics of the popula- 
tion. The first is the total Y of the values for some item over all units 
in the population (e.g. the total number of acres of wheat in a region), 
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The second is the average value F per unit (e.g. the average number 
of acres of wheat per farm). The third is the proportion or percentage 
of units which fall into some defined class (e.g. the percentage of 
farms growing no wheat). Estimation of the population total and 
mean will be considered in this chapter. 

The symbol ^ is used to denote an estimate of a population charac- 
teristic made from a sample. In this chapter only the simplest types 
of estimate are considered, as follows: 

Estimate 


Population mean: f = 7 = sample mean 


Population total: f = Nğ = Ny/n 


The factor N/n by which the sample total is multiplied is called vari- 
ously the expansion or raising or inflation factor. Its inverse, n/N, 
is of course the ratio of the size of the sample to that of the population. 
and is called the sampling ratio or the sampling fraction. ` 


2.3 Properties of the estimates. The precision of any estimate made 
from a sample depends both on the method by which the estimate is 
calculated from the sample data. and on the plan of sampling. To 
save space we sometimes write of “the precision of the sample mean” 
or “the precision of simple random sampling,” without specifically 
mentioning the other fundamental factor. This has been done, we 
hope, only in instances in which it is clear from the context what the 
missing factor is. When studying any formula that is presented, the 
reader should make sure that he knows the specific method of sampling 
and method of estimation for which the formula has been established. 

In this book, a method of estimation is called consistent if the esti- 
mate becomes exactly equal to the population value when n = N, 
that is, when the sample consists of the whole population. For sim- 
ple random sampling it is obvious that 7 and Ng are consistent esti- 
mates of the population mean and total, respectively. Consistency is 
a desirable property of estimates. On the other hand, an inconsistent 
estimate is not necessarily useless, since it may give satisfactory pre- 
cision when n is small compared to N. Its utility is likely to be con- 
fined to this situation. 

In statistical theory the notion of consistency has been discussed 
mainly for infinite populations. The usual definition is that 7 is a 
consistent estimate of Y if for any « > 0, 

lim Prig- F! >} =0 


n>a 
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There is more than one way in which this definition can be adapted so 
as to apply to a finite population, and the definition which we have 
given may not be the most useful one for studying the properties of 
estimates in large samples. However, the idea of consistency does not 
play an important part in the subsequent exposition. 

As we have seen, a method of estimation is wnbiased if the average 
value of the estimate, taken over all possible samples of given size n, 
is exactly equal to the true population value. If the method is to be 
unbiased without qualification, this result must hold for any popula- 
tion of finite values y; and for any n. To investigate whether ğ is un- 
biased with simple random sampling, we calculate the value of g for 
all yC, samples and find the average of the estimates. The symbol 
E denotes this average over all possible samples. 


Theorem 2.1 The sample mean 7 is an unbiased estimate of F. 
Proof: By its definition 


G_ È (tytn) 
N! 
ne 
n\(N — n)! 


Eğ = (2.3) 


= 
S 


where the sum extends over all NCn samples. To evaluate this sum, 
we find out in how many samples any specific value y; appears. Since 
there are (N — 1) other units available for the rest of the sample and 
(n — 1) other places to fill in the sample, the number of samples con- 
taining y; is 

(N — 1)! 


Doe na ee 
E 3 


Hence 


N — 1)! 
Xai + ye t+ yn) = 2 7 (Yi + Ya Fest yn) 


(n — 1)"N — n) 
From (2.3) this gives 


Ej = WN- 1)! n(N — n)! 
I=IN ml ayr O tH yn) 


_ 1 Fut t yw) 7 
a eR (2.4) 


weer Ê = Nj is an unbiased estimate of the population to- 
t: d 
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A less cumbersome proof of theorem 2.1 is obtained as follows. 
Since every unit appears in the same number of samples, it is clear 
that 


E(y, + y2 +--+ yn) must be some multiple of (yı + y2 +*+ yw) 
(2.5) 


The multiplier must be n/N, since the expression on the left has n 
terms and that on the right has N terms. This leads to the result. 


2.4 Variances of the estimates. The variance of the y; in a finite 
population is defined as a 
>D (yi — ¥)? 
v= = (2.6) 


As a matter of notation, results will be presented in terms of a slightly 
different, expression, in which the divisor (V — 1) is used instead of 
N. We take 


N 
> (yi - ie 
1 


eee ay Ga 


This convention has been used by those who approach sampling theory 
by means of the analysis of variance. Its advantage is that most re- 
sults take a slightly simpler form. Provided that the same notation 
is maintained consistently, all results are equivalent in either notation. 
We now consider the variance of j. By this we mean E(g — Y? 
taken over all yC, samples. 
Theorem 2.2 The variance of the mean g from a simple random 
Sample i 
A Y) a im (2.8) 
V@ = BO - YP =; N 
Proof: 
ng — Y) = m — Y) + G2 — YP) te Ga — Y) (2.9) 
By the same argument of symmetry as used in relation (2.5), it 
follows that . f 
n 
Bly, Hepe y (yn — YY = les — PP tet ow F 
(2.10) 
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and also that 
Eiun — ¥)(y2 — Y) + (i — Y)@s — Y) 
e+ Wn — Y) — YY] 
n(n — 


- an — Y)(ye — Y) + m — Y)@s — Y) 
+++ (yw — Y)(yw — Y)] (2.11) 


In equation (2.11) the sums of products extend over all pairs of units 
in the sample and population, respectively. The sum on the left con- 
tains n(n — 1)/2 terms, and that on the right N(N — 1)/2 terms. 

Now square (2.9) and average over all simple random sumples. 
Using (2.10) and (2.11), we obtain 


Ey — Y) ake = PP pt gw — YF 


2 
+ — > (i — Y)(y2 — Y) +t Gwar — Fuy- | 


Completing the square on the cross-product term, we have 
s n n—=1 
n Elg — Y? = N [( = a) {1 — YP +-+ (yw — Y)?} 


(n — 1) 
(VW - 1) 


The second term inside the square bracket vanishes, since the sum of 
the y; equals NY. Division by n? gives 


(0 — F) +-+ Gy - mI] 


(N — x P =o =n) 
TO= Ba- PE S ya Rea 
@) = E@ — Y) ANG — a x (yi — YP = = 
Corollary 1 The standard error of 7 is 
S N-n 
am RN T am 


Corollary 2 The variance of ¥ = Ny, as an estimate of the popu- 
lation: total Y, is 


V(f) = E(f — Y} = 


N28? (N — 
Fa ( n) (2.13) 


N 
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Corollary 3 The standard error of ¥ is 


(2.14) 


op = 


2.5 The finite population correction. For a random sample of size n 
from an infinite population, it is well known that the variance of the 
mean is o”/n. The only change in this result when the population is 
finite is the introduction of the extra factor, (N — n)/N. The factors 
(N — n)/N for the variance and VN — n)/N for the standard error 
are called the finite population corrections (fpc). They are given with 
a divisor (V — 1) in place of N by writers who present results in 
terms of ø. Provided that the sampling fraction n/N remains low, 
these factors are close to unity, and the size of the population as such 
has no direct effect on the standard error of the sample mean. For 
instance, if S is the same in the two populations, a sample of 500 from 
a population of 200,000 gives almost as precise an estimate of the popu- 
lation mean as a sample of 500 from a population of 10,000. Persons 
unfamiliar with sampling often find this result very difficult to be- 
lieve, and indeed it is remarkable. To them it seems intuitively ob- 
vious that, if information has been obtained about only a very small 
fraction of the population, the sample mean just cannot be aceurate. 
It is instructive for the reader to consider why this point of view is 
erroneous. 

In practice the fpe can be ignored whenever the sampling fraction 
does not exceed 5 per cent, and for many purposes even if it is as high 
as 10 per cent, The effect of ignoring the correction is to overestimate 
the standard error of the estimate 7. 

Ingenious methods for developing sampling theory for a finite popu- 
lation have been given by Cornfield (1944), Tukey (1950), and Wishart 
(1952), 

The following theorem, which is an extension of theorem 2.1, is 
Not required for the discussion in this chapter, but is proved here for 
later reference. 

Theorem 2.3 If y; xi ave a pair of variates defined on every unit 
in the population, and 9, Z are the corresponding means from a simple 
random sample of size n, then their covariance 
iope- p- _ Sw Ne aw 

Me- y= nN (N-—1) ia Í 
This theorem reduces to theorem 2.2 if the variates y;, t; are equal on 
every unit. 
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Proof: Apply theorem 2.2 to the variate uz = yi +2; The popu- 
lation mean of u; is 0 = Y + X, and theorem 2.2 gives 


Lis oe 


= T2 
maoe y ET = 


i.e. 


Ei — Y) + (z - X)? 


N-n) 1 Ri? (0:18 
= Te— Y 2X i 
N rÈ Y) + (æ: — X)}? (2.16) 


Expand the quadratic terms on both sides. By theorem 2.2, 


(N — n) 
nN woh? 


E- YP = 
with a similar relation for E(x; — X)?. Hence these two terms can- 
cel on the left and right sides of equation (2.16). The result of the 
theorem, equation (2.15), follows from the cross-product terms. 


2.6 Estimation of the standard error from a sample. The formulas 
for the standard errors of the estimated population mean and total 
are used primarily for three purposes: (i) to compare the precision 
obtained by simple random sampling with that given by other methods 
of sampling, (ii) to estimate the size of sample needed in a survey 
which is being planned, (iii) to estimate the precision actually at- 
tained in a survey that has been completed. The formulas involve 
S°, the population variance. In practice this will not be known, but 


it can be estimated from the sample data. The relevant result is 
stated in theorem 2.4. 


Theorem 2.4 For a simple random sample 
n 
È u- 7 
1 


711 
is an unbiased estimate of 


N 
x GES v7. 


Gia Feet 
N-1 
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Proof: We may write 
1 


v= ei Diu- -u-FP 
ST i=l 


1 = pa a 
“Fs [= (ys — F? — ng - 7 | 


Now average over all simple random samples of size n. By the argu- 
ment of symmetry used in theorem 2.2, 


ib n & = n(N — 1) 
i s- iss = PP = ua 
Hew F) } vo“ ) 7 
by the definition of S°. Further, by theorem 2.2, 
= (N-n) 
E ¥ Si a S? 
Elna — V = —_ 
Hence 
l = N-1)-(V-n)} =S? ei 
E(s2) = ——— [nN -D)-WV- n) = ; 
(s°) 7e Dw! ( 
Corollary. Unbiased estimates of the variances of g and Y = Ng are 
a A/N —2 
e ae in ——*) (2.18) 
N? (N — 7 
(Ê) =s = 7) (2.19) 


For the standard errors we take 


s N-n Ns N-n 
= — |--Z: eae 2.20 
u Vn N Vn N per) 


These estimates are slightly biased: for most applications the bias is 


unimportant. r 
The reader should note the symbols employed for true and esti- 


mated variances of the estimates. Thus for 7 we write 


True variance: V@) =o 
Estimated variance: og) = s 


The notation is a little redundant, but it is convenient to have 
Separate symbols V and v for variance, and ø and s for standard error. 
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2.7 Confidence limits. It is usually assumed that the estimates 7 

and Y are normally distributed about the corresponding population 

values. The reasons for this assumption and its limitations are con- 

sidered in section 2.8. If the assumption holds, lower and upper con- 

fidence limits for the population mean and total are as follows: 
Mean: 


pare N — 
ner- Pu -9+—- s (2.21) 


Total: 


tN. N- INs N-n 
=N- -i TETEN, =— e2 


The symbol ¢ is the value of the normal deviate corresponding to the 
desired confidence probability. The most common values are: 


Confidence probability (%) 50 80 90 95 99 
t 0.67 1.28 1.64 1.96 2.58 


If the sample size is less than 30, the percentage points may be taken 
from Student’s table with (n — 1) degrees of freedom, these being 
the degrees of freedom in the estimated variance s?. The ¢-distribu- 
tion holds exactly only if the observations y; are themselves normally 
distributed and N is infinite. Moderate departures from normality do 
not affect it greatly. For small samples with very skew distributions, 
special methods are needed. 

Example. Signatures to a petition were collected on 676 sheets. 
Each sheet had enough space for 42 signatures, but on many sheets a 
smaller number of signatures had been collected. The numbers of 
signatures per sheet were counted on a random sample of 50 sheets 
(about a 7 per cent sample), with the results shown in table 2.1. 

Estimate the total number of signatures to the petition-and the 80 
per cent confidence limits. 

The sampling unit is a sheet, and the observations yi are the num- 
bers of signatures per sheet. Since about half the sheets had the maxi- 
mum number of signatures, 42, the data are presented as a frequency 
distribution. _ Notice that the original distribution appears to be far 
from normal, the greatest frequency being at the upper end. Never- 
theless there is reason to believe from experience that the means of 


samples of 50 are approximately normally distributed. 
We find 


n= Lf = 50; y= fy = 1471; E fyt = 54,497 
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Hence the estimated total number of signatures is 


676) (1471 
f= Ng = CGD _ 10,588 
50 
For the sample variance s? we have 
1 if = 
i vere =)2}) = eRe an ee 
= (ES — = TT Em- “Sr 
1 1471)? 
= — [51497 = ( ) } = 229.0 
49 50 


` From equation (2.22) the 80 per cent confidence limits are 


peaa N-n tah (1.28) (676) (15.18) V1 — 0.0740 
, Vn N á v50 


= 19,888 + 1781 


This gives 18,107 and 21,669 for the 80 per cent limits. A complete 
count which was made showed 21,045 signatures. 


TABLE 2.1 RESULTS FOR A SAMPLE OF 50 PETITION SHEETS 


Number of signatures Frequency 
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2.8 Validity of the normal approximation. Confidence that the nor- 
mal approximation is adequate in most practical situations comes 
from a variety of sources. In the theory of probability, much study 
has been made of the distribution of means of random samples. It 
has been proved that for any population which has a finite standard 
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Ficure 2.1 Frequency distribution of sizes of 196 United States cities in 1920. 


deviation the distribution of the sample mean tends to normality as 
n increases (see e.g. Feller, 1950). This work relates to infinite popu- 
lations. Madow (1948) proved that for a large class of finite popula- 
tions the distribution of the sample mean tends to normality even if 
the sampling ratio n/N is not negligible and sampling is without re- 
placement. Madow stipulates that n and N both tend to infinity in 
such a way that the ratio n/N remains less than some number r < 1. 
His results would apply, for example, even if the sampling ratio were 
95 per cent. 

This imposing body of knowledge leaves something to be desired. 
It is not easy to answer the direct question: “For this population, how 
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large must n be so that the normal approximation is accurate enough?” 
Non-normal distributions vary greatly both in the nature and in the 
degree of their departure from normality. In sampling practice, it 
cannot be assumed that the frequency distributions which are en- 
countered will all be reasonably close to normality. The distributions 
of many types of economic enterprise (stores, chicken farms, towns) 
exhibit a marked positive skewness, with a few large units and many 
small units. The same kind of skewness is displayed by some biologi- 
cal populations (e.g. the number of rats or flies per city block). 


Frequency 


Millions 


Fiaure 2.2 Frequency distribution of totals of 200 simple random samples with 
n = 49. 


As an illustration of a positively skewed distribution, figure 2.1 
shows the frequency distribution of the numbers of inhabitants in 
196 large United States cities in 1920. (The 4 largest cities, New 
York, Chicago, Philadelphia, and Detroit, were omitted. Their in- 
clusion would necessitate extending the horizontal scale to over 5 
times the length shown, and would, of course, greatly accentuate the 
skewness.) Figure 2.2 shows the frequency distribution of the total 
number of inhabitants in each of 200 simple random samples, with 
n = 49, drawn from this population. The distribution of the sample 
totals, and likewise of the means, is much more similar to a normal 
curve, but still displays some positive skewness. 

In any discussion of the validity of the normal approximation, we 
must define what it means to say that the normal approximation is 
“accurate enough.” In sample surveys, the normal approximation is 
used primarily to calculate confidence limits. When 95 per cent confi- 
dence limits are computed for the population mean Y by the normal 
approximation, we make the following statement: 


g — 1.96s; < Y < g + 1.96s; (2.23) 


With repeated sampling, we claim that statements of this kind will 
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be wrong only 5 per cent of the time. Consequently, we might say 
that the normal approximation is accurate enough if such statements 
are in fact wrong between 4 and 6 per cent of the time. The choice 
of the numbers 4 and 6 is arbitrary: some workers may be satisfied 
with wider limits. 

From the study of theoretical distributions that are skewed and 
from the results of sampling experiments on actual skewed popula- 
tions, some statements can be made about what usually happens to 
confidence probabilities when we sample from positively skew popula- 
tions. The sample size is assumed large enough so that the distribu- 
tion of ğ shows some approach to’ normality, as in figure 2.2. The 
statements are as follows: 

i. The frequency with which the assertion 


g — 1.96s; < Y < g + 1.96s; 


is wrong, is usually slightly higher than 5 per cent. 
ii. The frequency with which 


Y > ğ + 1.96s; 


is greater than 2.5 per cent. 
iii. The frequency with which 


Y <7 — 1.96s; 
is less than 2.5 per cent. 


As an illustration, consider the Poisson distribution. The variate 


y: takes the values 0, 1, 2; ---, the probability that y; has the value 
u being 
ane 


Pry; = u) = 
u! 
For the Poisson`distribution, the distribution of the total y of a simple 
random sample of size n is known to be a Poisson distribution with 
parameter m’ = nm. From tables of this distribution (Molina, 1949) 
we can therefore find out how well the normal approximation to the 
confidence limits works for different values of n and m. 

Let us take m = 0.25, n = 400. Form = 0.25, the probabilities that 
yi = 0, 1, 2, 3,and 4 are, respectively, 0.7788, 0.1947, 0.0243, 0.0020, 
and 0.0001. The original distribution is obviously extremely skew. 

The sample total y follows a Poisson distribution with parameter 
m’ = (400)(0.25) = 100. -For this distribution, there are theoreti- 
cal results to the effect that 


Eu) =m: o? = Ely — w} =m 
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Consequently y is an unbiased estimate of m. As a sample estimate 
of the standard error of y, we take 


y= Yy 


Hence, 95 per cent confidence limits for m are constructed by the 
normal approximation as 


y —1.96Vy < m < y $F 1.96Vy 


Consider the probability that this statement is wrong. If y is 82, the 
upper limit (y + 1.96 Vy) turns out to be 99.7; and if y is 83, the 
upper limit is 100.9. Thus the stated upper limit is too low (since m’ 
actually is 100) whenever y is 82 or less in sampling from a Poisson 
distribution with m’ = 100. From Molina’s tables the probability 
that y is 82 or less is found to be 0.0369. Similarly, the lower limit in 
the statement is found to be too high whenever y is 122 or greater. 
The corresponding probability is 0.0181. To summarize, 


Pr(stated upper limit too low) = 0.0369 
Pr(stated lower limit too high) = 0.0181 


Pr(confidence statement wrong) = 0.0550 


The total probability of being wrong is satisfactorily close to 0.05, 
but in about 70 per cent of the statements that are wrong, the true 
m’ is higher than the stated upper limit. 

The result appears to be typical. If we are interested only in the 
absolute value of the error of estimate, a fair amount of positive 
skewness in the distribution of 7 can be tolerated, but if the frequency 
with which Y exceeds the upper confidence limit is to be close to 2.5 
per cent, the normal approximation is not trustworthy unless very 
little skewness remains in the distribution of 7. 

There is no safe general rule as to how large n must be for use of 
the normal approximation in computing confidence limits. For popu- 
lations in which the principal deviation from normality consists of 
marked positive skewness, & erude rule which I have occasionally 


found useful is n > 250? 


where G, is Fisher’s measure of skewness (Fisher, 1932). 


__ fy Wo a 
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This rule is designed so that a 95 per cent confidence probability 
Statement will be wrong not more than 6 per cent of the time. It is 
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derived mathematically by assuming that any disturbance due to mo- 
ments of the distribution of 7 higher than the third is negligible. The 
rule attempts to control only the total frequency of wrong statements, 
ignoring the direction of the error of estimate. 

By calculating G,, or an estimate, for a specific population, we can 
obtain a rough idea of the sample size needed for application of the 


TABLE 2.2 FREQUENCY DISTRIBUTION OF ACRES IN CROPS ON 556 FARMS 


Class Coded Fre- 
intervals | scale quency fin fyè fyè 
(acres) Yi fi 

0-29 —0.9 47 —42.3 38.1 —34.3 

30-63 0 143 0 0 0 

64-97 1 154 154 154 154 

98-131 2 82 164 328 656 
132-165 3 62 186 558 1,674 
166-199 4 33 132 528 2,112 
200-233 5 13 65 325 1,625 
234-267 6 6 36 216 1,296 
268-301 ya 4 28 196 1,372 
302-335 8 6 48 384 3,072 
336-369 9 2 18 162 1,458 
370-403 10 0 0 0 0 
404-437 11 2 22 242 2,662 
438-471 | 12 0 0 0 0 
472-505 13 2 26 338 4,394 
Totals 556 836.7 | 3,469.1 | 20,440.7 

= 6.7 
Eu) =Y = zgo 7 150486 


9. 
Elyp) = = = 6: 
(yi?) Bag 7 628939 


20,440.7 
Ey?) = — gg = 3676385 


o = Bly) — F? = 3.97479 
«3 = Ely: — F)’ = Ely?) — 3EY + 27° 


= 15.411 
q = 54, 
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normal approximation to compute confidence limits. The result 
should be checked by sampling experiments whenever possible. 

Example. The data in table 2.2 show the numbers of acres devoted 
to crops in 556 farms in Seneca County, New York. The data come 
from a series of studies by West (1951), who drew repeated samples of 
size 100 from this population and examined the frequency distribu- 
tions of #, s, and Student’s ¢ for several items of interest in farm man- 
agement surveys. 5 

The computation of G, is shown under the table. The computations 
are made on a coded scale, and since G, is a pure number, there is no 
need to return to the original scale. Note that the first class-interval 
was slightly different from the others. 

Since G, = 1.9, we take as a suggested minimum n 


n = (25)(1.9)? = 90 


For samples of size 100, West found with this item (acres in crops) 
that neither the distribution of 7 nor that of Student’s ¢ differed sig- 
nificantly from the corresponding theoretical normal distributions. 

Good sampling practice tends to make the normal approximation 
more valid. Failure of the normal approximation occurs mostly when 
the population contains sume extreme individuals which dominate the 
sample average when they are present. However, these extremes also 
have a much more serious effect of increasing the variance of the sam- 
ple and decreasing the precision. Consequently, it is wise to segregate 
them and make separate plans for coping with them, perhaps by tak- 
ing a complete enumeration of them if they are not numerous. This 
removal of the extremes from the main body of the population re- 
duces the skewness and improves the normal approximation. This 
technique is an example of stratified sampling, which is discussed in 
chapter 5. 


2.9 Effect of non-normality on the estimated variance. One effect of 
hon-normality is that the estimated variance s$ may be more highly 
variable from sample to sample than we expect if we assume that we 
are sampling from a normal distribution. For any infinite population, 
the variance of s? in random samples of size n is (Fisher, 1932) 


20% K4 
+= (2.24) 


n-1l n 


V(s?) = 


The first term after the equality sign is the value which the variance 
of s? has when the parent distribution is normal. The second term 
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represents the effect of non-normality. The quantity «4 is Fisher’s 
fourth cumulant (Fisher, 1932) and is given by 


k, = Ey: — yj — 304 


Note that skewness in the original distribution, as measured. by Gi, 
does not affect the stability of s*: the important factor is the fourth 
moment in the parent population. ss 

The cumulant «4 is zero for a normal distribution. It may take 
either positive or negative values in other distributions, but in those 
encountered in sampling practice, x4 appears to be positive much 
more often than negative, and may have a high value for sume parent 
distributions. 

We may write (2.24) as 


Vist 204 (: ee) = n=" a) 
awe g 2n ot n=l Lt 2n : 
where Gy = «4/o* is Fisher’s measure of kurtosis (loc. cil.). Th 
quantity inside the parentheses shows the factor by which the vari- 
ance of s? is inflated owing to non-normality. Note that the factor is 
almost independent of n, so that the inflation remains even with large 
samples. 

For West’s data on farm acres in crops (table 2.2), the value of Gs 
will be found to be about 6. Thus V(s?) is close.to 4 times as large as 
would be assumed if we regarded the original distribution of acres in 
crops as normal. In his sampling studies, West found a similar in- 
flation in the variance of the standard deviation s, in 3 items which 
he tested. The ratio of V(s) to the theoretical variance of s from a 
normal population was 3.7 for acres in crops, 2.1 for total acres oper- 
ated, and 13.7 for productive-man-work units. (By theory this ratio 
should be roughly the same for s as for s?.) 

The relevance of these results in practical sampling is that we some- 
times use values of s$? to compare the precision of one method of 
sampling with that of another, or to estimate the sample size needed 
to attain a specified degree of precision in 7 (see chapter 4). For these 
purposes it is well to have some idea of the precision of the estimate 
s°, particularly if it has been calculated from rather scanty data. As 
the previous résults indicate, use of the “normal” formula for apprais- 


ing the variance of s* may give a very misleading impression of the 
stability of s®. 
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2.10 Exercises. 


2.1 Ina population with N = 6, the values of y: are 8, 3, 1, 11, 4, and 7. 
Calculate the sample mean for all possible simple random samples of size 2. 
Verify that 7 is an unbiased estimate of FY and that its variance is as given in 
theorem 2.2. 

2.2 Yor the same population, calculate s* for all simple random samples of 
size 3, and verify that H(s?) = S°. 

23 If random samples of size 2 are drawn with replacement from this 
population, show by finding all possible samples that V(g) satisfies the equa- 
tion a a? S? (N — 1) 


VO t N 


Give a general proof of this result. 
2.4 A simple random sample of 30 households was drawn from a city area 


containing 14,848 households. The numbers of persons per household in the 
sample were as follows: 


5, 6, 3, 3, 2, 3, 3, 3, 4, 4, 3, 2, 7, 4, 3, 5, 4, 4, 3, 3, 4, 3, 3, 1, 2, 4, 3, 4,2, 4 


Estimate the total number of people in the area and compute the probability 
that this estimate is within +10 per cent of the true value. 

2.5 The table below shows the numbers of inhabitants in each of the 197 
United States cities which had populations over 50,000 in 1940. Calculate 
the standard error of the estimated total number of inhabitants in all 197 
cities for the following methods of sampling: (i) a simple random sample of 
size 50, (ii) a sample which includes the 5 largest cities and is a simple random 
sample of size 45 from the remaining 192 cities, (iii) a sample which includes 
the 9 largest cities and is a simple random sample of size 41 from the remaining 


cities, 
FREQUENCY DISTRIBUTION OF CITY SIZES 


Size class Size class Size class 
(1000's) f (1000's) f (1000's) Í 
50-100 105 550-600 2 woni Aa 
100-150 36 600-650 1 1500-1550 1 
150-200 13 650-700 2 pADR sans 
200-250 6 700-750 0 1600-1650 1 
250-300 7 750-800 1 eee sa 
300-350 8 800-850 1 1900-1950 1 
350-400 4 850-900 2 i asia 
400-450 1 900-950 0 3350-3400 1 
450-500 3 950-1000 0 sana T 
500-550 0 1000-1050 0 7450-7500 4 


Gaps in the intervals are indicated by ..-- 


2.6 Calculate the coefficient of skewness G, for the original population 
and for the population remaining after removing (i) the 5 largest cities, (ii) 
the 9 largest cities. 
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2.7 With certain populations it is known that the observations y; are all 
zero on a portion gÑ of the N units (0 < q < 1). Sometimes, with varying 
expenditures of effort, these units can be found and listed, so that they need 
not be sampled. If o? is the variance of y; in the original population, and oo? 
is the variance when all zeros are excluded, show that 


Ce = Fe 

P Pp 
where p = 1 — q. 

If the population total is estimated from a simple random sample of size 
n, show that with the exclusion of the “zero” units the fractional reduction 


in the variance of the estimate is 
qV? + 1) 
y2 


where V?-= o?/Y?.is the square of the coefficient of variation in the original 
population. (For further discussion of this technique, see Jessen and House- 
man, 1944.) The fpe may be omitted. 
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CHAPTER 3 
SAMPLING FOR PROPORTIONS AND PERCENTAGES 


3.1 Qualitative characteristics. Sometimes we wish to estimate the 
total number, or the proportion, or the percentage of units in the 
population which possess some characteristic or attribute, or fall into 
some defined class. Many of the results regularly published from 
censuses or surveys are of this form, e.g. numbers of unemployed 
persons, percentage of the population that is native-born. The classi- 
fication may be introduced directly into the questionnaire, as with 
questions that are answered by a simple “yes” or “no.” In other 
cases the original measurements are more or less continuous, and the 
classification is introduced in the tabulation of results. Thus we may 
record the respondents’ ages to the nearest year, but publish the per- 
centage of the population aged 60 and over. 

Notation. We suppose that every unit in the population falls into 
one of the two classes C and C’. The notation is as follows: 


Number of units in C in Proportion of units in C in 
Population Sample Population Sample 
A a P=A/N p=a/n 


The sample estimate of P is p, and the sample estimate of A is Np 
or Na/n. 

3.2 Variances of the sample estimates. By means of a simple device 
it is possible to apply the theorems established in chapter 2 to this 


situation. For any unit in the sample or population, define y; as 1 
if the unit is in C and as 0 if it is in C’. For this population of values 


Yi it is clear that x 
Y=}v=A (3.1) 


=P (3.2) 
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Also, for the sample, 


-M»- 
s 


Eg 63) 
n n 


ğ= 


Consequently the problem of estimating A and P can be regarded 
as that of estimating the total and mean of a population in which 
every y; is either 1 ór 0. In order to use the theorems in chapter 2, 
we first express S? and 3? in terms of P and p. Note that 


N n 
yf =A =NP: Ly =a=np 
1 1 

Hence, 


N 7 N 
Eu- Dy? - NY? 
1 1 


(ot ee ee 
N-1 N-1 
1 
= ——— (NP — NP?) = ———P 3.4 
wn ) W-D Q (8.4) 
where Q = 1 — P. Similarly 
È v: - g)? 
= (3.5), 


n-—1 ~“G— De" 


Application of theorems 2.1, 2.2, and 2.4 to this population gives 


the following results for simple random sampling of the units that are 
being classified. 


Theorem 3.1 The sample proportion p = a/n is an unbiased esti- 
mate of the population proportion P = A/N. 


Theorem 3.2 The variance of pis 


VG) = Ep — P} = Hi imi -o 


n -1 


using equation (3.4). 


Corollary 1 If p and P are the sample and population percentages, 


respectively, falling into class C, formula (3.6) continues to hold for 
the variance of p. p 
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Corollary 2 The variance of A = Np, the estimated total number 
of units in class C, is 


V(A) = (3.7) 


N?PQ /N — *) 
2 


7 = 4 


Theorem 3.3 An unbiased estimate of the variance of p, derived 
from the sample, is 


SIE (3.8) 


Proof: In theorem 2.4, corollary, it was shown that for a continuous 
variate y; an unbiased estimate of the variance of the sample mean 7 is 
gatas (3.9) 
v Sa . 

a n N 
For proportions, p takes the place of 7, and in equation (3.5) we showed 
that 

n 


(n — 1) 


(N — n) 
v(p) = s = n-p” 


2 


pa (3.10) 


Hence, 


It follows that if N is very large relative to n, so that the fpe is 
negligible, an unbiased estimate of the variance of p is 


pa 
n-1 


This result may appear puzzling to some readers, since the expression 
pq/n is almost invariably used in practice for the estirhated variance. 
The fact is that pg/n is not unbiased even with an infinite population. 
Corollary. An unbiased estimate of the variance of A = Np, the 
estimated total number of units in class C in the population, is 


v(A) = snp” = pa (3.11) 

Example. From a list of 3042 names and addresses, a simple ran- 
dom sample of 200 names showed on investigation 38 wrong addresses. 
Estimate the total number of addresses needing correction in the list 
and find the standard error of this estimate. We have 


N = 3042; n= 200; a= 38; p = 0.19 
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The estimated total number of wrong addresses is 
Â = Np = (3042)(0.19) = 578 


J ee gee ae 
199 i 


Since the sampling ratio is under 7 per cent, the fpe makes little dif- 
ference. To remove it, replace the term (N — n) by N. If, in addi- 
tion, we replace (n — 1) by n, we have the simpler formula 


Pq (0.19) (0.81) 
Snp = N E = (3042) | a 7 84.4 


This is in fairly close agreement with the previous result, 81.8. 

The preceding formulas for the variance and the estimated variance 
of p hold only if the units are classified into C or C’ so that p is the 
ratio of the number of units in C in the sample to the total number 
of units in the sample. There is a common situation in which each 
unit is composed of a group of elements, and it is the elements that 
are classified. A few examples are as follows: 


Sampling unit Elements 
Family Members of the family 
Restaurant Employees 

Crate of eggs Individual eggs 

Peach tree Individual peaches 


If a simple random sample of units is drawn in order to estimate the 
proportion P of elements in the population which belong to class C, 
the preceding formulas do not apply, except sometimes as a fair 
approximation. 

If each sampling unit contains the same number M of elements, let 
Pi = a;/M be the proportion of elements in C in the ith unit and let 
p = > pi/n be the sample estimate of P. The correct procedure is 


to apply the formulas of chapter 2 to the quantities p; With simple 
random sampling, 


(N — n) Ai N 
E p e S E 
(p) MWD x (pi — P)? 


and an unbiased estimate of this variance is 


(N-n) 1 n 
CIC x (p: — p)? 


v(p) = 


aioe 
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If M varies from unit to unit, the problem is more complicated: 


F appropriate methods are presented in section 6.9. 


3.3 The effect of P on the standard errors. Equation (3.6) shows 
how the variance of the estimated percentage changes with P, for 
fixed n and N. If the fpe is ignored, we have 


PQ 
V(p) = i 


The function PQ and its square root are shown in table 3.1. These 
functions may be regarded as the variance and standard deviation, 
respectively, for a sample of size 1. 


TABLE 3.1 Vauues or PQ ann VPO 


P = Population percentage in class C. 


0j 10 20 30 40 50 60 70 80 90 | 100 


PQ o | 900 | 1600 | 2100 | 2400 | 2500 | 2400 | 2100 | 1600 | 900 | 0 
VP@\0| 30| 40] 46] 49} 50 ‘a9| 46| 40| 30] 0 


The functions have their greatest values when the population is 
equally divided between the two classes, and are symmetrical about 
this point. The standard error of p changes relatively little when P 
lies anywhere between 30 and 70 per cent. At the maximum value of 
V PQ, 50, a sample size of 100 is needed to reduce the standard error 
of the estimate to 5 per cent. To attain a 1 per cent standard error 
requires a sample size of 2500. 

This approach is not appropriate when interest lies in the total 
number of units in the population which are in class C. In this event 
it is more natural to ask: Is the estimate likely to be correct to within, 
say, 7 per cent of the true total? Thus we tend to think of the stand- 
ard error expressed as a fraction or percentage of the true value, NP. 


The fraction is 


oxp NVPQ oe N-n T 
NP VaNPVN-1 VnaNPNN-1 12) 


This quantity is usually called the coefficient of variation of the esti- 
mate. If the fpe is ignored, the coefficient is VQ/nP. The ratio 
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vV Q/P, which might be considered the coefficient of variation for a 
sample of size 1, is shown in table 3.2. 


TABLE 3.2 VALUES or \/Q/P FOR DIFFERENT VALUES OF P 


P = Population percentage in class C. 


P 0 O12 GF 2 5 10 20 
VQ/P | 2 31.6 14.1 9.9 4.4 3.0 2.0 
P [30 40 50 60 70 80 9 

VQ/P | t 12 1.0 0.8 0.7 0.5 0.3 


For a fixed sample size, the coefficient of variation of the estimated 
total in class C decreases steadily as the true percentage in C increases. 
The coefficient is high when P is less than 5 per cent. Very large 
samples are needed for precise estimates of the total number possessing 
any attribute that is rare in the population. For P = 1 per cent, 
we must have Vn = 99 in order to reduce the coefficient of variation 
of the estimate to 0.1 or 10 per cent. This gives a sample size of 9801. 
Simple random sampling, or any method of sampling that is adapted 
for general purposes, tends to be an expensive method of estimating 
the total number of units of a scarce type. The problem is analogous 
to that of finding the total number of needles in a haystack. 


3.4 The binomial distribution. Since the population is of a particu- 
larly simple type, in which the y; are either 1 or 0, we can find the 
actual frequency distribution of the estimate p and not merely its 
mean and variance. 

The population contains A units that are in class C and (N — A) 
units in C’, where P = A/N. If the first unit that is drawn happens 
to be in C, there will remain in the population (A — 1) units in C 
and (N — A) in C”. Thus the proportion of units in C, after the first 
draw, changes slightly to (A — 1)/(N — 1). Alternatively, if the first 
unit drawn is in C’, the proportion in C changes to A/(N — 1). In 
sampling without replacement, the proportion keeps changing in this 
way throughout the draw. In the present section these variations 
are ignored, i.e. P is assumed constant. This amounts to assuming 
that A and (N — A) are both large relative to the sample size n. 

With this assumption, the process of drawing the sample consists 
of a series of n trials, in each of which the probability that the unit 
drawn is in C is P. This situation gives rise to the familiar binomial 
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frequency distribution for the number of units in C in the sample. 
The probability that the sample contains a units in C is 


! 
PG) — Po (3.13) 
al(n — a)! 

From this expression we may tabulate the frequency distribution of 
a, or of p = a/n, or of the estimated total Np. The most comprehen- 
sive tables are those published by the U. S. National Bureau of Stand- 
ards (1950). They give both individual terms and the cumulative 
sums of the terms for sample sizes up to 49 and for P varying by inter- 
vals of 0.01. For tables with n between 50 and-100, see Romig (1952). 


3.5 The general distribution of ġ. The distribution of p can be found 
without the assumption that the population is large relative to the 
sample. The numbers of units in the two classes C and C’ in the pop- 
ulation are A and A’, respectively. We will calculate the probability 
that the corresponding numbers in the sample are a and a’, respec- 
tively, where 

ata =n: A+A =N 


We may assume that a < A, and a’ < A’, because it is impossible to 
draw a sample in which these inequalities do not hold, the sampling 
being without replacement. 

Consider a sample in which the first a units drawn all fall in class 
C. At the first draw, the probability that a C is obtained is A/N. 
After the first draw, there remain (A — 1) units in C in the popula- 
tion, so that the probability of obtaining a C at the second draw is 
(A — 1)/(N — 1). Since it is supposed that a C turns up at the sec- 
ond draw also, the probability of a C at the third draw is (A — 2)/ 
(N — 2), and so on. The probability that the first a units drawn are 


all in C is the product 
At = aie: ot }) 


3.14 
N(N — 1)(N — 2) --- V-a+)) ae 


To obtain the type of sample in which we are interested, all the 
remaining units drawn must fall in C’. At the (a + 1)th draw, the 
Population has A’ units in class C” out of (N — a) units. The proba- 
bilities of a C’ at successive draws are therefore 


A’ w- (A' — 2) 


’ + ete. 
(N — a) (N —a—1) (N —a— 2) j 
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Hence, the probability that all remaining units fall in C’ is the product 
A'(A’ — 1)(A’ — 2) --- (A’ - a’ + 1) 


(N —a)(N —a—1)---(N—n+1) (3.15) 


Multiplication of (3.14) and (3.15) gives the probahility of obtaining 
a sample in which the first a units all fall in C and the remaining a’ 
units all fall in C’: 


_AA-1) + A-at)EOI4'-D)- A+) 
k NN —1) ++: (N—n+1) 


PP 
(3.16) 


The reader may verify that this expression also holds for any specified 
order in which the units appear, provided that a of them are in C and 
a’ in C’. To find the total probability, we count how many different 
orders can be specified, that is, how many distinct ways a objects of 
one kind, and a’ objects of another kind, can be arranged in order along 
a line. This number is given by the familiar quantity »Ca, or 


n! 
al(a’)t 


Finally, the probability that the sample contains a units in C and 
a’ in C’ is 


Pr(a, a'/A, A’) 
__m AA 1) + (A-a t+ DANA - 1+ a +N 


ala’! N(N —1)+--(N—n+1) 
l (3.17) 
This result may be written in more compact form as 
Pr(a, a’/A, A’) = EEA Ce (3.18) 


NOn 


This ïs the frequency distribution of a or np, from which that of p 
is immediately derivable. The distribution is called the hypergeo- 
metric distribution. 

Example. A family of 8 contains 3 males and 5 females. Find the 
frequency distribution of the number of males in a simple random 
sample of size 4. In this case 


A=8; A’=5; N=8; n=4 
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From formula (3.17) the distribution of the number of males, a, is 
as follows: 


a Probability 

o 2 543.2 1 
OM! 8.7.6.5 14 

i 4! 3543 6 
113! 8.7.6.5 14 
4! 3254 6 
212! 8.7.65 14 
4! 3215 1 

3 eS ics 


3! 8.7.6.5 14 
4 Impossible =0 
The reader may verify that the mean number of males is Ẹ and the 


variance is $$. These results agree with the formulas previously es- 
tablished, section 3.2, which give 


nA (4)(8) 3 

E(np) = nP = w & 3 
E = ih 3.5 4 _15 
Vip) = PAg = 4g 8 7T 


3.f Confidence limits. We first discuss the meaning of confidence 
limits in the case of qualitative characteristics. In the sample, a out 
of n fall in class C. Suppose that inferences are to be made about the 
number A in the population which fall in class C. For an upper confi- 
dence limit to A, we compute a value Ay such that for this value the 
probability of getting a or less falling in C in the sample is some small 
quantity ay, e.g. 0.025. Formally, Av satisfies the eqliation 


> Pr(j,n — j/Âv, N — Av) = ay (3.19) 


j=0 


where Pr is the probability term for the hypergeometric distribution, 
as defined in equation (3-18). 

When ay is chosen in advance, equation (3.19) requires in general 
a non-integral value of Ay to satisfy it, whereas conceptually Ay 
should be a whole number. In practice we choose Ay as the smallest 
integral value of A such that the left side of (3.19) is less than or equal 
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to ay. Similarly, the lower confidence limit Êz is the largest integral 
value such that 


E Pr(j,n — j/Âr, N — Ax) < a (3.20) 


j=a 


Confidence limits for P are then found by taking Py = Ay/N; Py = 
A,/N. 

.The normal approximation is often serviceable. In theorem 3.2, 
it was found that the standard error of p is 


a ee 
oa N-1 n 


If it is assumed that p is normally distributed, with estimated stand- 


ard error 
2 N—n |pq 
Sy! = | i 
N-1Nn 


we obtain, as a normal approximation to the confidence limits, 


N—n In N-n In 
PL=p—t : Py = l — 
LD A ri Ps as a 7 


where ¢ is the normal deviate corresponding to the confidence proba- 
bility.* 

It is worth while to amend these formulas by inserting a correction 
for continuity, whenever this correction has an appreciable effect. 
The rationale of the correction may be explained as follows. Suppose 
that 30 units out of 70 are observed to fall in class C, and we wish to 
approximate Py. Using the exact distribution, we would find Py 
such that the sum of the probabilities that 0, 1, +++, 30 units fall in 
C is ay. If the exact distribution is to be approximated by a con- 
tinuous normal distribution, it is natural to regard the ordinate at 30 
as corresponding to the area of the normal curve between 294 and 303. 
Thus the sum of terms from 0 to 30 corresponds to the area of the 
normal curve below the point 303. The effect of the correction is to 


* In theorem 3.3 it was shown that an unbiased sample estimate of øp? is 


a2 = N pa 
? N@=1) 
In estimating Êz and Py, sp might have been used for the standard error of p 


instead of sp’. However, sp’ was preferred because it is more familiar, and both 
estimates appear to give about equally good approximations. 
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compute Py by the equation Py = p + 1/2n + ts'p. This increases Py 
by 1/2n. Similarly, Êz is decreased by 1/2n. The amended limits 
may be written thus: 


N—n pà ~| 
pafi VN-1 fe B24 


The correction for continuity produces a slight rather than a sub- 
stantial improvement in the approximation. However, without the 
correction, the confidence interval as found by the normal approxima- 
tion is usually too narrow, and the correction helps to remedy this 
defect. 

The error in the normal approximation depends on all the quantities 
n, p, N, ay, and az. The quantity to which the error is most sensi- 
tive is np, or more specifically the number observed in the smaller 
class. Table 3.3 gives working rules for deciding when the normal 
approximation (3.21) may be used. 


TABLE 3.3 SMALLEST VALUES OF np FOR USE OF THE NORMAL APPROXIMATION 


np = number observed 
in the smaller class n = sample size 


p 
0.5 15 30 
0.4 20 50 
0.3 24 80 
0.2 40 200 
0.1 60 600 
0.0 70 1400 
~t 80 BS 


* This means that p is extremely small, so that np follows the Poisson distribution. 


The rules in table 3.3 are constructed so that with 95 per cent confi- 
dence limits the true frequency with which the limits fail to enclose 
P is not greater than 5.5 per cent. Further, the probability that the 
upper limit is below P is between 2.5 and 3.5 per cent, and the proba- 
bility that the lower limit exceeds P is between 2.5 and 1.5 per cent. 
These restrictions on the one-tailed frequencies of error seemed ad- 
visable because the binomial distribution is in general skew (see sec- 
tion 2.8). The rules are not guaranteed to satisfy these probability 
Statements in all cases, since exhaustive examination is lengthy, but 
I believe that the statements are generally true. The choice of 5.5, 
3.5, and 1.5 per cent for the probabilities is of course arbitrary. The 
reader who is content with greater error in the normal approximation 


can allow lower values of n- 
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When the situation lies outside the range of validity of the normal 
approximation, or when greater accuracy is desired, reference may be 
made to charts of the confidence limits of the hypergeometric function 
by Chung and DeLury (1950). These give the 90, 95, and 99 per 
cent limits for P for population sizes of 500, 2500, and 10,000. Values 
for intermediate population sizes may be obtained by interpolation. 

An alternative when the normal approximation does not apply is 
to use the limits for the binomial distribution, adjusted if necessary 
so as to take account of the finite population correction. For n less 
than 50, the bimonial limits are quickly found from Tables of the bi- 
nomial frequency distribution (U. S. National Bureau of Standards). 
A convenient table of the limits themselves, constructed by W. L. 
Stevens, is given in Fisher and Yates’s Statistical tables, 3rd ed., table 
VIII, i. The limits presented by Stevens are those for nP, since this 
quantity is more amenable to compact tabulation than P itself. The 
method of amending these limits so as to allow for the finite popula- 
tion correction is illustrated in example 2 below. 

Example 1. In a simple random sample of size 100, from a popula- 
tion of size 500, there are 37 units in class C. Find the 95 per cent 
confidence limits for the proportion and for the total number in class 
C in the population. In this example, 


n = 100; N = 500; p = 0.87 


The example lies in the range in which the normal approximation is 
recommended. The estimated standard error of p is 


N- 100 (0.37)(0.63 
J hs r OBO) ona 
W- in V499 100 


The correction for continuity, 1/2n, equals 0.005. Hence the 95 per 
cent limits for P are estimated as 


(N — n) pa 1 ) 
(N—1)n 2n 


= 0.37 + (1.96 X 0.0432 + 0.005) = 0.37 = 0.090 
P, = 0.280: Py = 0.460 


Px 


The limits as read from the charts by Chung and DeLury are 0.285 
and 0.462, respectively. 

To find limits for the total number in class C in the population, we 
multiply by N, obtaining 140 and 230, respectively. 
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Example 2. This example shows how binomial limits may be used 
as an approximation. Suppose that for another item in the previous 
sample 9 units out of the 100 fall in class C. This is outside the range 
for the normal approximation. 

The 95 per cent binomial limits for the expected number nP in 
class C are read from Fisher and Yates's Statistical tables, table VIII, 
1, as 4.20 and 16.40. Dividing by n, we obtain approximate limits of 
0.042 and 0.164 for p. If the sampling ratio is less than 5 per cent, 
limits found in this way are close enough for most purposes. In this 
sample, the sampling ratio is 20 per cent, so that the fpe should be 
applied. The fpe factor is 


= 400 
San ISi 
N1 V499 


To apply the correction, we shorten the interval between p and each 
limit by this factor. The adjusted limits are as follows: 


Pz, = 0.09 — (0.895)(0.09 — 0.042) = 0.047 
Py = 0.09 + (0.895)(0.164 — 0.09) = 0.156 


The limits obtained from the tables by Chung and DeLury are 0.045 
and 0.157, respectively. 


3.7 Classification into more than two classes. Frequently, in the 
presentation of results, the units are classified into more than two 
classes. Thus a sample from & human population may be arranged 
in fifteen 5-year age groups. Even when a question is supposed to 
be answered by a simple “yes” or “no,” the results actually obtained 
may fall into four classes: “yes,” “no,” “don’t know,” and “no an- 
swer.” The extension of the theory to such cases will be illustrated 
by the situation in which there are three classes. 

We suppose that the number falling in the ith class is A; in the 


population and a; in the sample, where 
A; ay 


N= At n= Dau Pi=z' p= 


When the sample size 7 is small relative to all the A;, the probabili- 
ties P; may be considered effectively constant throughout the draw- 
ing of the sample. The probability of drawing the observed sample is 


given by the multinomial expression 


n! š 
= apapa” 3.22 
Pra) == e Gam 
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This is the appropriate extension of the binomial distribution, and is 
a good approximation when the sampling fraction is small. 

The correct expression for the probability of drawing t'e observed 
sample is 


Pr(ai/A;) = whe ater ln 


This expression is a natural extension of equation (3.18), section 3.5, 
for the hypergeometric distribution. A proof will not be given: the 
result can be established by the method used in proving the hyper- 
geometric distribution. 


(3.23) 


3.8 Confidence limits when there are more than two classes. Two 
different cases must be distinguished. 
Case I. We calculate 


Number in any one class in sample a 


p= =— 
n n 


or 
Total number in a group of classes ay + az + ag 


p= ' say 


n n 

For example, if the answers are classified into “yes,” “no,” “don’t 
know,” and “no answer,” we might take p as the proportion in the 
sample answering “yes,” or alternatively as the proportion in the 
sample giving a definite answer, either “yes” or “no.” In either of 
these situations, although the original classification contains more 
than two classes, p itself is obtained from a subdivision of the n units 
into only two classes. The theory already presented applies to this 
case. Confidence limits are calculated as described in section 3.6. 

Case II. Sometimes certain classes are omitted, p being computed 
from a breakdown of the remaining classes into two parts. For ex- 
ample, we might omit persons who did not know or gave no answer, 
and consider the ratio of number of “yes” answers to “yes” plus “no” 
answers. Ratios which are structurally of this type are often of in- 
terest in sample surveys. The denominator of such a ratio is not n, 
but some smaller number n’. 

The frequency distribution of p is more complicated than in Case I, 
because both the numerator and denominator of p vary from one 
sample to another, even although all samples have the same total 
size n. This presents an obstacle to the calculation of confidence 
limits. Most of the complications can be avoided by the device, com- 
mon in statistical theory, of calculating confidence limits from the 
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conditional distribution of p, given n and n’. In this method we con- 
struct confidence statements which will be true, with the assigned 
confidence probability, over all samples which have the same n and 


n’ as the observed sample. 
The reason why this device helps is that the conditional distribu- 


tion of p is obtained from an ordinary hypergeometric distribution, 
as will be shown in the next section. To give the result more pre- 
cisely, suppose that 

ay 


p a, + a2 
so that a3 is the number in the sample falling in classes in which we 
are not at the moment interested. Then the conditional distribution 
of a, and ag is the hypergeometric distribution when the sample is 
of size n’ and the population of size N’ = A, + Ag. In particular, 
the conditional standard error of p is 


IN’ —1n’ [Pe 
ge Wei n! 


The normal approximations to conditional confidence limits for P = 


4ı/(4ı TU A2) are 
PEEN, 1 
a ne + 5) (8.24) 
N'=1 Nn” 2n 


Although n’ is known, N’ is not known 
so that equations (3.24) are not usable as 
dditional information about N’, one pro- 
N'/N, i.e. to estimate N’ as Nn’ /n. 
ives the approximate limits 


n =a, +a: n= u + a + a 


One difficulty remains. 
from the sample results, 
they stand. Failing any & 
cedure is to assume that n’/n = 
Substitution of this value in (3.24) g 


Pu =P! N (aja) NE w 


N-n n 
Py =ptt Wi (n/n’) n Qn! 


affected by the approximation used for N’. 


(3.25) 


The fpc is the only term 


n this section we indicate how 


39 T iti istribution of $. I 
‘he conditional distri /(@a + 42) is obtained, and 


the conditional distribution of P = %1 
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present an illustration. With three classes, the probability of drawing 
the observed sample has been given [equation (3.23)] as 


Pr(a;/Ai) = Coy aie aCe 


(3.26) 
In the conditional distribution, n’ = a; + a is fixed. All samples of 
size n which do not have this value of n’ are ignored. Samples which 
have this value retain the same relative probabilities as in equation 
(3.26). To find the conditional distribution, we divide (3.26) by the 
total probability for all permissible samples. 

This total probability is simply the probability that n’ out of n 
will fall in class 1 or 2. It is given by the hypergeometric distribution: 


ditd Cartas)" ALa (3.27) 
nen 


On division of (3.26) by (3.27), the conditional distribution of the 


sample is obtained as 
r A Cas “AC as 
Pr@/Ayn') = R ` 


(A +40C (atan) 


_ MCa’ Car 


2 
= (3.28) 


where N’ = A, + Ao, n’ = a; +a. This is an ordinary hypergeo- 
metric distribution for a sample of size n’ from a population of size NV’ Ms 

As an illustration, consider a population which consists of the five 
units A, B, C, D, E, which fall in three classes. 


Class Ay Units denoted by 
1 1 A 
2 2 B,C 
3 2 D,E 


With unrestricted random samples of size 3, we wish to estimate 
P = A,/(A; + Ag), or in this case 4. Thus N = 5, and N’ = 3. 

There are 10 possible samples of size 3, all with equal initial proba- 
bilities. These will be grouped according to the value of n’. 


n =1 
Conditional 
Sample a a2 p probability (p — P) 
ADE 1 0 1 3 $ 
BDE or CDE 0 1 0 $ -4 
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If samples are specified by the values of ai, a2, only two types are 
obtainable: a; = 1, az = 0; a1 = 0, a2 = 1. Their conditional prob- 
abilities, 4 and 2, respectively, agree with the general expression (3.28). 
Further, 

Ep) =3 
on? = OG) + O@® = FF = g 


The estimate p is unbiased, and its variance agrees with the general 


formula 
5 N’ — n'\ PQ 606- 
ki -Gave “\3—1/\3/\3/ 9 
For n’ = 2 there are six possible samples, which give only two sets 
of values of ay, a2- 


ni =2 
Conditional 
Sample a a P probability (p — P) 
ABD, ABE, ACD, or ACE 1 1 4 2 A 
BCD or BCE 0 2 0 4 -} 


The estimate is again unbiased, and its variance is 
op? = Geis) + GA) = as 


which may be verified from the general formula. Note that the vari- 
f that obtained when n’ = 1. In a condi- 


ance is only one-fourth oi i E 
tional approach, the variance changes with the configuration of the 


sample that was drawn. 

For n’ = 3, there is on 
correct population fraction, 3 
as is indicated by the general 
N =w. 


ly one possible sample, ABC. This gives the 
The conditional variance of p is zero, 
formula, which reduces to zero when 


3.10 Exercises. 

3.1 For a population with N=6,4A=4,4' =2, work out the value of 
@ for all possible simple random samples of size 3. Verify the theorems given 
for the mean and variance of p = a/n. Verify that 


is an unbiased estimate of the variance of p. s 
-3.2 In a simple random sample of 200 from a population of 2000 colleges, 


120 colleges were in favor of & proposal, 57 were opposed, and 23 had no 
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opinion. Estimate 95 per cent confidence limits for the number of colleges 
in the population which favored the proposal. 

3.3 Do the results of the previous sample furnish conclusive evidence that 
the majority of the colleges in the population favored this proposal? 

3.4 A population with N = 7 consists of the elements Bı, C1, C2, C3, Dy, 
Ds, and D3. A simple random sample of size 4 is taken in order to estimate 
the proportion of C’s to C’s + D’s. Work out the conditional distributions 
of this proportion, p, and verify the formula for its conditional variance. 

‘8.5 In the previous exercise, what is the probability that a sample of size 
4 contains By? Hence find the average variance of p over all simple random 
samples of size 4, and verify your answer by the general formula. 

3.6 A simple random sample of 290 households was chosen from a city 
area containing 14,828 households. Each family was asked whether it owned 
or rented the house and also whether it had the exclusive use of an indoor 
toilet. Results were as follows: 


Owned Rented 
Exclusive use of toilet Yes No Yes No Total 
141 6 109 34 290 


(i) For families which rent, estimate the percentage in the area who have ex- 
clusive use of an indoor toilet and give the standard error of your estimate; 
(ii) estimate the total number of renting families in the area who do not have 
exclusive indoor toilet facilities and give the standard error of this estimate. 

3.7 In a simple random sample of size 5 from a population of size 30, no 
units in the sample were in class C. By the hypergeometric distribution, find 
the upper limit to the number A of units in class C ‘in the population, corre- 
sponding to a one-tailed confidence probability of 95 per cent. 

3.8 In sampling for an attribute that is rare, one method is to continue 
drawing a simple random sample until m units which possess the rare attribute 
have been found (Haldane, 1945), where m is decided in advance. If the fpe 
can be ignored, show that the probability that the total sample required is 
of size n is 

(n — 1)! 


m-pa- m T" = (m2 m) 


where P is the frequency of the rare attribute. Find the average size of the 
total sample and show that p = (m — 1)/(n — 1) is an unbiased estimate of 
P. (For further discussion see Finney, 1949.) 
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CHAPTER 4 
THE ESTIMATION OF SAMPLE SIZE 


4.1 A hypothetical example. In the planning of a sample survey, a 
stage is always reached at which some decision must be made about. 
the size of the sample. The decision is an important one. Too large 
a sample implies a waste of resources, and too small a sample dimin- 
ishes the utility of the results. The decision cannot always be made 
satisfactorily, for often we do not possess enough information to be 
sure that our choice of sample size is the best one. Sampling theory 
provides a framework within which to think intelligently about the 
problem. 

A hypothetical example may bring out the steps involved in reaching 
a solution. An anthropologist is preparing to study the inhabitants of 
some island. Among other things, he wishes to estimate what per- 
centage of the inhabitants belongs to blood group O. Cooperation 
has been secured so that it is feasible to take a simple random sample. 
How large should the sample be? 

This question cannot be discussed without first receiving an answer 
to another question: How accurately does the anthropologist wish to 
know the percentage of people with blood group O? In reply, he states 
that he will be content if the percentage is correct to within +5 per 
cent, in the sense that, if the sample shows 43 per cent to have blood 
group O, the percentage for the whole island is sure to lie between 38 
and 48, 

To avoid misunderstanding, it may be advisable to point out to the 
anthropologist that we cannot absolutely guarantee accuracy to within 
5 per cent except by measuring everyone. However large n is taken, 
there is a chance of a very unlucky sample which is in error by more 
than the desired 5 per cent. The anthropologist replies coldly that he 
is aware of this, that he is willing to take a 1 in 20 chance of getting an 
unlucky sample, and that all he asks for is the value of n instead of a 
lecture on statistics. 

We are now in a position to make a rough estimate of n. To simplify 
matters, the fpe is ignored, and the sample percentage p is assumed 
normally distributed. Whether these assumptions are reasonable can 
be verified when the initial n is known. 

50 
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In technical terms, p is to lie in the range (P + 5), except for a 
1 in 20 chance. Since p is assumed normally distributed about P, it 
will lie in the range (P + 2c), apart from a 1 in 20 chance.* Further, 


: [PQ 
oy = 4a 
n 
fP 4P 
2 pg =5 or n= sete) 
n 25 


At this point a difficulty appears which is common to all problems 
in the estimation of sample size. A formula for n has been obtained, 
but n depends on some property of the population which is to be sam- 
pled. In this instance the property is the quantity P which we would 
like to measure. We must therefore ask the anthropologist if he can 
give us some idea of the likely value of P. He replies that from pre- 
vious data on other ethnic groups, and from his speculations about 
the racial history of this island, he will be surprised if P lies outside 
the range 30 to 60 per cent. 

This information is sufficient to provide a usable answer. For any 
value of P between 30 and 60, the product PQ lies between 2100 and 
a maximum of 2500 at P = 50. The corresponding n lies between 336 
and 400. To be on the safe side, 400 is taken as the initial estimate 


Hence, we may put 


of n. 
The assumptions made in this analysis can now be re-examined. 


With n = 400 and a P between 30 and 60, the distribution of p should 
be close to normal. Whether the fpe is required depends on the num- 
ber of people on the island. If the population exceeds 8000, the sam- 
pling fraction is less than 5 per cent and no adjustment for fpc is called 
for. The method of applying the readjustment, if it is needed, is dis- 


cussed in section 4.4. 


4.2 Analysis of the problem. The principal steps involved in the 


choice of a sample size are as follows: 
i. There must be some statement as to what is expected of the 


sample. This statement may be in terms of desired limits of error, as 
in the previous example, or in terms of some decision that is to be 
made or action that is to be taken when the sample results are known. 
The responsibility for framing the statement rests primarily with the 


* The factor 2 instead of the more correct factor 1.96 gives a small margin of 
safety, 
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persons who wish to use the results of the survey, though they fre- 
quently need guidance in putting their wishes into numerical terms. 

ii. Some equation must be found which connects n with the desired 
precision of the sample. The equation will vary with the content of 
the statement of precision and with the kind of sampling that is en- 
visaged. One of the advantages of probability sampling is that it 
enables this equation to be constructed. 

iii. This equation will contain, as parameters, certain unknown 
properties of the population. These must be estimated in order to 
give specific results. 

iv. It often happens that data are to be published for certain major 
subdivisions of the population, and that desired limits of error are set 
up for each subdivision. A separate calculation is made for the n in 
each subdivision, and the total n is found by addition. 

v. More than one item or characteristic is usually measured in a 
sample survey: sometimes the number of items is large. If a desired 
degree of precision is prescribed for each item, the calculations lead to 
a series of conflicting values of n, one for each item. Some method 
must be found for reconciling these values. 

vi. Finally, the chosen value of n must be appraised to see whether 
it is consistent with the resources available to take the sample. This 
demands an estimation of the cost, labor, time, and materials required 
to obtain the proposed size of sample. It sometimes becomes apparent 
that n will have to be drastically reduced. A hard decision must then 
be faced—whether to proceed with a much smaller sample size, thus 
reducing precision, or to abandon efforts until more resources can be 
found. 

In succeeding sections, some of these questions are examined in 
more detail. 


4.3 The specification of precision. The statement of precision de- 
sired may be made by stating the amount of error which we are willing 
to tolerate in the sample estimates. This amount is determined, as 
best we can, in the light of the uses to which the sample results are to 
be put. Sometimes it is difficult to decide how much error should be 
tolerated, particularly when the results have several different uses. 
Suppose that we asked the anthropologist why he wished the per- 
centage with blood group O to be correct to 5 per cent, rather than, 
say, 4 or 6 per cent. He might reply that the blood group data are to 
be used primarily for racial classification. He strongly suspects that 
the islanders belong either to a racial type with a P of about 35 per 


| 
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cent or to one with a P of about 50 per cent. An error limit of 5 per 
cent in the estimate seemed to him small enough to permit classifica- 
tion into one of these types. He would, however, have no violent 
objection to 4 or 6 per cent limits of error. 

Thus the choice of a 5 per cent limit of error by the anthropologist 
was to some extent arbitrary. In this respect the example is typical 
of the way in which a limit of error is often decided upon. In fact, the 
anthropologist was more certain of what he wanted than many other 
scientists and many administrators will be found to be. When the 
question of desired degree of precision is first raised, such persons may 
confess that they have never thought about the question and have no 
ideas as to the answer. My experience has been, however, that after 
a little discussion they can frequently indicate at least roughly the 
size of a limit of error which appears reasonable to them. 

Further than this we may not be able to go in many practical situa- 
tions. Part of the difficulty is that not enough is known about the 
consequences of errors of different sizes as they affect the wisdom of 
practical decisions that are made from survey results. This subject 
deserves more study than it is currently receiving. As knowledge 
accumulates, the choice of a desired degree of precision will become 
easier. Even when the consequences of errors are known, however, 
there are many important surveys whose results are used by different 
people for different purposes, and some of the purposes are not fore- 
seen at the time when the survey is planned. Consequently, an ele- 
ment of guesswork is likely to be prominent in the specification of pre- 
cision for some time to come. 

If the sample is taken for a very specific purpose, e.g. for making a 
single “yes” or “no” decision, or for deciding how much money to 
spend on a certain venture, the precision needed can usually be stated 
in a more definite manner, in terms of the consequences of errors in 
the decision. A general approach to problems of this type is given in 
section 4.8, which, although in need of amplification, offers a logical 


Start on a solution. 


4.4 The formula for n in sampling for proportions. The units are 
classified into two classes, C and C’. Some margin of error d in the 
estimated proportion p of units in class C has been agreed upon, as 
well as a small risk œ which we are willing to incur that the actual 


error is larger than d. That is, we want 


Prilp-P|2=U =e 
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Simple random sampling is assumed, and p is taken as normally 
distributed. From theorem 3.2, section 3.2, 


IN =n [PQ 
Op = me =— 
N-1 n 
Hence the formula which connects n with the desired degree of preci- 


sion is 
N — Pi 
TE a 
N-1 n 


where ¢ is the abscissa of the normal curve which cuts off an area «æ at 
the tails. Solving for n, we find 


?PQ 
a 


= FPQ 
ee (eee ae 
tale) 


For practical use, an advance estimate p of P is substituted in this 
formula. If N is large, a first approximation is 


(4.1) 


me (4.2) 
Note that d and ¢ enter the formulas only in their ratio. If 


2 
Y= = Desired variance of the sample proportion 
t 


we have 
att 


n 
"P 
In practice we first calculate no. If no/N is negligible, no is a satis- 
factory approximation to the n of equation (4.1). If not, it is apparent 
on comparison of (4.1) and (4.2) that n is obtained as 


n= as (4.3) 
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Example. In the hypothetical blood groups example, we had 
d = 0.05; p= 0.5; a=0.05; t=2 


(4) (0.5) (0.5) 
ma SS 
(0.0025) 


Thus 


Let us assume that there are only 3200 people on the island. The 
fpe is needed, and we find 
no 400 
= —— = 356 
m-1 14+ 3% 
N 


The formula for no holds also if d, p, and q are all expressed as per- 
centages instead of proportions. Since the product pq increases as p 
moves toward 4 or 50 per cent, a conservative estimate of n is obtained 
by choosing for p the value nearest to 3 in the range in which p is 
thought likely to lie. If p seems likely to lie between 5 and 9 per cent, 
for instance, we assume 9 per cent for the estimation of n. 

A good discussion of sample size for proportions, with a specific 


application, is given by Cornfield (1951). 


1 = 


et 


4.5 The formula for n with continuous data. If 7 is the average of 
the observations from a simple random sample, we wish to have 


Pr{lg-Y|>dj=a 


where d is the chosen margin of error, and «œ a small probability. We 
assume that 7 is normally distributed: from theorem 2.2, corollary 1, 


its standard error is 


N-n CS 
E NUNEN 
Hence 
N-n S 
d= — eee (4.4) 
This gives 


n= TN 
1455) 
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As in the previous section, we take as a first approximation 
no = (=) = = (4.5) 
d Vv 
This is adequate unless no/N is appreciable, in which event we com- 


pute n as 
no 


= 


(4.6) 
no 


ioc 
g N 


Example. In nurseries which produce young trees for sale, it is 
advisable to estimate, in late winter or early spring, how many healthy 
young trees are likely to be on hand, since this determines policy to- 
wards the solicitation and acceptance of orders. A study of sampling 
methods for the estimation of the total numbers of seedlings was under- 
taken by Johnson (1943). The data which follow were obtained from 
a bed of silver maple seedlings, 1 ft wide and 430 ft long. The sampling 
unit was 1 ft of the length of the bed, so that N = 430. By complete 
enumeration of the bed, it was found that Y = 19, S? = 85.6, these 
being the true population values. 

With simple random sampling, how many units must be taken to 
estimate Y within 10 per cent, apart from a chance of 1 in 20? From 
equation (4.5) we obtain 


Fa (1.9)? 
Since no/N is not negligible, we take 
95 
g =m 
1+ 230 


Almost 20 per cent of the whole bed has to be counted in order to 
attain the precision desired. 

The previous example is atypical in that the population value S? is 
known. In practice, S? is estimated from previous sampling of a simi- 
lar or related population, or by intelligent guesswork. This points to 
the value of publishing, or at least keeping accessible, records of 
standard deviations obtained in sample surveys, as a guide for future 
samples. 

Even with little guidance from previous work, a serviceable estimate 
of S? can often be made. For example, in early studies in the estima- 
tion of wireworm populations in the soil, a tool was used which took a 
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small sample (e.g. 9 x 9 x 5 in. deep) of the topsoil. For estimation of 
sample size, the sampler needed to know the standard deviation of the 
number of wireworms which a sample would contain. If wireworms 
were distributed at random over the topsoil, the number found in a 
small volume would follow a Poisson distribution, for which Sa. 
Since there might be some tendency for wireworms to congregate, it 
was decided to assume S? = 1.2Y, the factor 1.2 being an arbitrary 
safety factor, Although Y itself was not known, the values of Y that 
are of economic importance could be delineated from studies of the 
wireworm densities that are critical with respect to damage to crop 
growth. These two pieces of information made it possible to deter- 
mine sample sizes that proved satisfactory. Deming (1950) gives use- 
ful hints for estimating S from some knowledge of the range and shape 
of the distribution. 

The formulas for 7 given here apply only to simple random sampling 
in which the sample mean is used as the estimate of Y. For other 
methods of sampling and estimation, the appropriate formulas will be 


presented with the discussion of these techniques. 
In the discussion thus far, we have specified that n must be large 


enough so that Prilg— Y|>dj=a 


where d has been called the margin of error. Alternatively, the sample 
size is sometimes specified as large enough to provide a confidence in- 
terval of half-width d, with confidence probability (1 — «). Except 
e two methods of specification lead to the same 


11 now be shown. 
dth of the confidence interval is 


in small samples, thes 
estimated value for n, as Wi 1 
From section 2.7, the half-wi 


N-n s 
N Vn 
This equation is the same as equation (4.4), with s in place of S, 
except that if n is found to be less than 30, the t-value in equation (4.7) 
nt’s é-distribution, with (n — 1) degrees 


for d is obtained from Stude! ition, w 
of freedom, instead of from the normal distribution. Both methods 


are approximations (see section ATY. 


d=t (4.7) 


re than one item. In most surveys, informa- 
than one item. One method of determining 
error for those items that are re- 
An estimation of the sample size 
h of these important items. 


4.6 Sample size with mo 
tion is collected on more 
sample size is to specify margins of 
garded as most vital to the survey- 
needed is first made separately for eac 
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When the single-item estimations of n have been completed, it is 
time to take stock of the situation. It may happen that the n’s re- 
quired are all reasonably close. If the largest. of the n’s falls within 
the limits of the budget, this n is selected. More commonly, there is 
a sufficient variation among the n’s required so that we are reluctant 
to choose the largest, either from budgetary considerations or because 
of the fact that this will give an overall standard of precision sub- 
stantially higher than was originally contemplated. In this event the 
desired standard of precision may perhaps be relaxed for certain of the 
items, in order to permit the use of a smaller value of n. 

In some cases the n’s required for different items are so discordant 
that certain items must be dropped from the inquiry, because with the 
resources available the precision expected for these items is totally 
inadequate. The difficulty may be not merely one of sample size. 
Some items call for a different type of sampling from others. With 
populations that are sampled repeatedly, it is useful to amass infor- 
mation about those items which can be combined economically in a 
general survey, and those which necessitate special methods. As an 
example, a classification of items into four types, suggested by experi- 
ence in regional agricultural surveys, is shown in table 4.1. In this 


TABLE 4.1 AN EXAMPLE OF DIFFERENT TYPES OF ITEM IN REGIONAL SURVEYS 


Type 


iii 


iv 


Characteristics of item 


Widespread throughout the region, occur- 
ring with reasonable frequency in all 
parts of the region. 

Widespread throughout the region, but 
with low frequency. 

Occurring with reasonable frequency in 
most parts of the region, but with more 
sporadic distribution, being absent in 
some parts and highly concentrated in 
others. 


Distribution very sporadic, or concen- 
trated in a small part of the region. 


Type of sampling needed 


A general survey with low 
sampling ratio. 


A general survey, but with 
a higher sampling ratio. 
For best results, a strati- 
fied sample with different 
intensities in different 
parts of the region (chap- 
ter 5). Can sometimes 
be included in a general 
survey with some supple- 

mentary sampling. 

Not suitable for a general 
survey. Requires a sam- 
ple geared to its dis- 
tribution. 


classification, a general survey means one in which the units are fairly 
evenly distributed over some region, as for example by a simple ran- 
dom sample. 
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4.7 Stein’s method of two-stage sampling. If S or P is estimated 
from previous data, or by guesswork, the calculations given for n do 
not provide any assurance that the margin of error or the confidence 
limits will be of the desired size, because the provisional estimates of 
S and P may be wrong. For continuous data, Stein (1945) developed 
a method which guarantees that the confidence interval will be no 
larger than some stated amount. In this method, the information 
about S is obtained from the population that is being sampled. Since 
Stein’s technique assumes that the parent population is normal, its 
practical application is restricted to sampling in which this condition 
ethod is, however, of great interest, 


is approximately satisfied. The m 
f precision which is not dependent 


because it gives a chosen degree o 


on the correctness of initial guesses. . 
The sample is taken in two parts. The first part, of size nı, supplies 


an estimate s of S, calculated in the usual way, and also a preliminary 
estimate of the mean Y. When the first part has been taken, Stein 
shows how to calculate the number of additional observations needed 
in order to have a specified confidence interval. Both parts must be 
samples from the population that is being surveyed. If the population 
changes with time, the time interval between the first and second parts 
must be sufficiently small so that no appreciable change will have 
occurred. 

Since the method was developed 
where n/N is negligible will be const 
has been obtained, a confidence inter 
half-width of this interval is 

ts/ Vm 


where ż denotes the tvalue for (nı — 1) degrees of freedom and confi- 
dence probability (1 — @)- If this quantity is less than or equal to d, 
the desired half-width, the sample is already sufficiently large. If the 
quantity exceeds d, additional observations are taken so that the dotal 


size of sample n is at le: 


for infinite populations, the case 
dered first. When the first sample 
val for Y can be calculated. The 


ast as great as 
t2s*/d? 


Then, if g is the mean of the whole sample, 


pPrilg-Yl24 Sa (4.8) 
oof assumes that the observations, yi, yz, ©"; 


d about Y. Throughout the proof, d, a, and 
he total sample size n is not fixed, but is a 


Sketch of proof. The pr 
Yn, are normally distribute 
n are fixed quantities. Tl 
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random variate, since its value depends on the value of s that turns up 
in the first sample. Nevertheless, for fixed s, n is fixed, and the quantity 


Vn- Y) 


is normally distributed with mean zero and variance o°. Hence, this 
quantity follows the normal distribution whether s is fixed or not. 
Moreover, by a well-known property of the normal distribution, the 
distribution is independent of that of s. Consequently, 


Vn (g — Y)/s 


follows the ¢-distribution with (nı — 1) degrees of freedom. By defini- 
tion of ¢; it follows that 


ap 


This is the key result in the proof. Further, by the way in which the 
value of n was calculated, we always have 


7 Vnt 
Vrs, o — aa 
=F a a (4.10) 
Hence, from (4.9) a 
alg — Y 
i m e s2 


Pr{lg-—Y|>d}<a 


Example. Suppose that d = 10, a = 0.05. From previous infor- 
mation, S is guessed as about 60, although this guess may be seriously 
in error. The first step in applying Stein’s method is to select a value 
of nı, the size of the initial sample. For this, it is helpful to know how 
large the final sample would have to be if the assumed value of S hap- 
pened to be correct. This value is 


S /2 X 60\? 
E-E -u 
d 10 
where ¢ is taken from the normal distribution. Suppose that n; is 
taken as 50. This value gives a reasonably large number of degrees of 


freedom for estimating S and does not commit us to too large an initial 
sample in case S should turn out to be less than we feared. 


| 
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When the first sample is taken, s? is found to be 1938. Since 4, for 

49 df, is 2.01, we have 
ts _ (2.01)(44.02) _ 


Vn 70711 


so that the sample of 50 gives a confidence interval of half-width 12.51 
instead of 10. Finally, n is chosen so that 


h?s? (4.040) (1938) _ 
2 100 


That is, 29 additional observations are taken to make the total n = 79. 


12.51 


78.3 


n= 


4.8 An attempt at a general solution of the sample size problem. 
This approach is most easily explained in the situation where certain 
practical decisions are to be made from the sample results. Such deci- 
sions will presumably be more fruitful if the sample estimate has a low 
error than if it has a high error. We may be able to calculate, in mone- 
tary terms, the loss Į(2) that will be incurred in a decision through an 
error of amount z in the estimate. Although the actual value of z is not 
predictable in advance, sampling theory enables us to find the fre- 
quency distribution f(z, n) of z, which for a specified sampling method 
will depend on the sample size n. Hence the expected loss for a given 


size of sample is 


L(n) = f 1) fle, n) de 


The purpose in taking the sample is to diminish this loss. If C(n) 
is the cost of a sample of size n, clearly n should be chosen so as to 


minimize O(n) + L(n) 


since this is the total cost involved in taking the sample and in making 
decisions from its results. The choice of n determines both the opti- 
mum size of sample and the most advantageous degree of precision. 
Alternatively, the same approach can be presented in terms of the 
monetary gain which accrues from having the sample information, 
rather than in terms of the loss which arises from errors in the sample 
information. If monetary gain is used, we construct an expected gain 
G(n) from a sample of size n, where G(n) is zero if no sample is taken. 


We maximize Gin) — C(n) 


In this form the principle is equivalent to the rule in classical eco- 
nomics that profit is to be maximized. 
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The simplest application occurs when the loss function, l(2), is A2, 
where à is a constant. It follows that 


Lin) = dE (2) 


For instance, if P is the sample estimate of Y, and z = ? — va 


2 


L(n) = AV (Î) = = 


if simple random sampling is used and the fpc is ignored. 
The simplest type of cost function for the sample is 


C(n) = co + c&n 


where co is the overhead cost. By differentiation, the value of n which 


minimizes cost plus loss is 
AS? 
ma: j 
Cy 


A more general form of this result is given by Yates (1949). The same 
analysis applies to any method of sampling and estimation in which 
the variance of the estimate is inversely proportional to n and the cost 
is a linear function of n. 

Blythe (1945) describes the application of this principle to the 
estimation of the volume of timber in a lot for selling purposes (see 
exercise 4.6). Nordin (1944) discusses the optimum size of sample for 
estimating potential sales in a market which a manufacturer intends 
to enter. If the sales can be forecast accurately, the amount of fixed 
equipment and the production per unit period can be allocated so as 
to maximize the manufacturer’s expected profit. 

Although the application of this technique is likely to be restricted 
to situations in which the sample is taken for a specific purpose, it 
seems probable that this approach to the problem of sample size has 
a number of fruitful applications which have not yet been realized. 

A related problem is the sampling of lots of articles in a mass-pro- 
duction process, in order to determine whether the lot is to be accepted 
or rejected on the basis of its estimated quality. Since the purpose of 
the sample is usually to lead to a single “yes” or “no” decision, the 
best sample size can be decided by examining the consequences of 
errors in the decision. Good introductory accounts of the techniques 
are given by Tippett (1950) and Deming (1950). 
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4.9 Exercises. 


4.1 A survey is to be made of the prevalence of the common diseases in. a 
large population. For any disease that affects at least 1 per cent of the in- 
dividuals in the population, it is desired to estimate the total number of cases 
with a coefficient of variation of not more than 20 per cent. (i) What size of 
simple random sample is needed, assuming that the presence of the disease 
can be recognized without mistakes? (ii) What size is needed if total cases 
are wanted separately for males and females, with the same precision? 

4.2 In a wireworm survey, the number of wireworms per acre is to be es- 
timated with a limit of error of 30 per cent, at the 95 per cent probability level, 
in any field in which wireworm density exceeds 200,000 per acre in the top 5 
in. of soil. The sampling tool measures 9 X 9 X 5 in. deep. Assuming that 
the number of wireworms in a single sample follows a distribution slightly 
more variable than the Poisson, we take S?=1.2Y. What size of simple 
random sample is needed? (1 acre = 43,560 sq ft.) 

4.3 The following coefficients of variation per unit were obtained in a 
farm survey in Iowa, the unit being an area 1 mile square (data of R. J. Jessen) : 


Estimated ev 


Item (%) 
Acres in farms 38 
Acres in corn 39 
Acres in oats 44 
Number of family workers 100 
Number of hired workers 110 
Number of unemployed 317 


A survey is planned to estimate acreage items with a ev of 2} per cent and 
numbers of workers (excluding unemployed) with a ev of 5 per cent. With 
simple random sampling, how many units are needed? How well would this 
sample be expected to estimate the number of unemployed? ; 

4.4 By experimental sampling, the mean value of a random variate is to 
be obtained correct to 0.001, with confidence probability 95 per cent. The 
values of the random variate for the first 20 samples drawn are shown below. 


How many more samples are needed? 


Sample Value of Sample Value of 

no. random variate no. random variate 
1 0725 11 -0712 
2 .0755 12 .0748 
3 .0759 13 .0878 
4 .0739 14 .0710 
5 .0732 15 0754 
6 0843 16 .0712 
7 0727 17 -0757 
8 .0769 18 .0737 
9 0730 19 -0704 

10 0727 20 -0723 
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4.5 If the loss function due to an error in ĝ is \| 7 — F | and if the cost 
C = co + cm, show that with simple random sampling, ignoring the fpe, the 
most economical value of n is 

(me) 
eV 2r 


4.6 (Adapted from Blythe, 1945.) The selling price of a lot of standing 
timber is UW, where U is the price per unit volume and W is the volume of 
timber on the lot. The number N of logs on the lot is counted, and the aver- 
age volume per log is estimated from a simple random sample of n logs. The 
estimate is made and paid for by the seller and is provisionally accepted by 
the buyer. Later, the buyer finds out the exact volume purchased, and the 
seller reimburses him if he has paid for more than was delivered. If he has 
paid for less than was delivered, the buyer does not mention the fact. 

Construct the seller’s loss function. Assuming that the cost of measuring 
n logs is en, find the optimum value of n. The standard deviation of the vol- 
ume per log may be denoted by S, and the fpe ignored. 
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CHAPTER 5 
STRATIFIED RANDOM SAMPLING 


5.1 Description. In stratified sampling, the population of N units is 
first divided into subpopulations of Ni, N2, +++, Nz, units, respectively. 
These subpopulations are non-overlapping and together they comprise 
the whole of the population, so that 


NiNa +e +N =N 


The subpopulations are called strata. To obtain the full benefit from 
stratification, the values of the N, must be known. When the strata 
have been determined, a sample is drawn from each stratum, the draw- 
ings being made independently in different strata. The sample sizes 
within the strata are denoted by nı, 2, ***) NL, respectively. 

If a simple random sample is taken in each stratum, the whole 
procedure is deseribed as stratified random sampling. 

Stratification is a very common technique. There are many reasons 
for this; the principal ones are the following: 

i. If data of known precision are wanted for certain subdivisions of 


the population, it is advisable to treat each subdivision as a “popula- 


tion” in its own right. 

ii, Administrative convenience may dictate the use of stratification; 
e.g. the agency conducting the survey may have field offices, each of 
which can supervise the survey for a part of the population. 

iii. Sampling problems may differ markedly in different parts of the 
Population. With human populations, people living in institutions 
(e.g. hotels, hospitals, prisons) are often placed in a different stratum 
from people living in ordinary homes, because a different approach to 
the sampling is appropriate for the two situations. 

iv. Stratification may bring about a gain in precision in the esti- 
mates of characteristics of the whole population. The basic idea is 
that it may be possible to divide a heterogeneous population into sub- 
Populations, each of which is internally homogeneous. This is sug- 
gested by the name strata, with its implication of a division into layers. 
If each stratum is homogeneous, in that the measurements vary little 
from one unit to another, a precise estimate of any stratum mean can 
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be obtained from a small sample in that stratum. As will be shown, 
these estimates can then be combined into a precise estimate for the 
whole population. 

The theory of stratified sampling deals with the properties of the 
estimates from a stratified sample and with the best choice of the sam- 
ple sizes n} so as to obtain maximum precision. In this development 
it is taken for granted that the strata have already been constructed. 
The prior problems of how to construct strata and of how many strata 
there should be will be postponed to a later stage (section 5.15). 


5.2 Notation. The suffix h denotes the stratum and 7 the unit within 
the stratum. The notation is a natural extension of that previously 
used. The following symbols all refer to stratum h: 


Nn Total number of units 

Mh Number of units in sample 

Uni Value obtained for the ith unit 
Nr 
De Yai 

a True mean 

h N, 

Nh 
Ds Yhi 


Sample mean 


Na 
2 (Yni — Yi) 
ps2 True variance 
r Fauci 
Note that the divisor for the variance is (N} — 1). 


5.3 Properties of the estimates. For the population mean per unit, 
the simplest type of estimate appropriate to stratified sampling is 7, 
(st for stratified), where 


(6.1) 


where N = Ny + No +-:-+ Nz. 
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The estimate Fs: is not in general the same as the sample mean, for 
the sample mean, g, can be written as 


L 
D nain 
_ hal 
——— (5.2) 
n 


The difference is that in J. the estimates from the individual strata 
receive their correct weights N,/N. It is evident that 7 coincides 
with Fe provided that in every stratum 


Nh Ni Nh n 
—=—: or — = — = Constant 


This means that the sampling fraction is the same in all strata. This 
stratification is described as stratification with proportional allocation 
of the na. It gives a self-weighting sample. If numerous estimates 
have to be made, a self-weighting sample is time-saving. 

The principal properties of the estimate Js: are outlined in the fol- 
lowing theorems. The first two theorems apply to stratified sampling 
in general and are not restricted to stratified random sampling: i.e. the 
sample from any stratum need not be a simple random sample. 


Theorem 5.1 If in every stratum the sample estimate Ẹa is unbiased, 
then s is an unbiased estimate of the population mean Y. 

Proof: M H 
E Niin D NY 
h=1 h=1 


E(Gst) = La =a 


since the estimates are unbiased in the individual strata. But the 
population mean Y may be written 


5 
z 
> 


This completes the proof. f 
Corollary. Since gn is an unbiased estimate of Y; for simple random 


sampling within strata, Js: is an unbiased estimate of Y for stratified 


random sampling. 
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Theorem 5.2 For stratified sampling, the variance of J, as an esti- 
mate of the population mean Y, is 
is 
D NV Gn) 
vs h=l 
Vgs) = — (5.3) 


VG) = Elga — Yn)? 


There are two restrictions on the theorem: (i) 7, must be an unbiased 
estimate of Y}, and (ii) the samples must. he drawn independently in 
different strata. 


where 


Proof: 
Nan =~ NY a 
hele y 
T D NiGn — FY,) 
N 


where the sum extends over all strata. Note that the error (Js — Y) 
in the estimate is now expressed as a weighted mean of the errors of 
estimation which have been made within the individual strata. Hence 


E Nen — Yi)? | 2 E NN — Yi) Gi — Y) 
N? 1. ah E y 


Ua — Y)? = 


where the right-hand term extends over all pairs of strata. 

We now average over all possible samples. For any cross-product 
term, we begin by keeping the sample in stratum h fixed, and average 
over all samples in stratum j. Since sampling is independent in the 
two strata, the possible samples in stratum j will be the same and have 
the same probabilities, whatever sample has been drawn in stratum h. 
But since 7; is assumed unbiased, the average of (gj; — Y;) is zero. 
Hence, all cross-product terms vanish. 

The squared terms give 


D NE (Gn — Yn)? by > NV) 


VG) = a = 


The important point about this result is that the variance of gss de- 
pends only on the variances of the estimates of the individual stratum 
means Y,. If it were possible to divide a highly variable population 
into strata such that all items had the same value within a stratum, we 
could estimate Y without any error. Examination of the proof shows 
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that it is the use of the correct stratum weights M} which leads to this 
result. 


Theorem 5.8 For stratified random sampling, the variance of the 


estimate sı is 3 


dee S, 
V@u) = <3 2 Na (Na — m) = (5.4) 


Proof: Since ğa is an unbiased estimate of Yn, theorem 5.2 can be 
applied. Further, by theorem 2.2, applied to an individual stratum, 


Si? (Na — ma) 


Fs 


5.5 
“a (5.5) 


By substitution into the result of theorem 5.2, we obtain 
i 2 


i% Soe 1 & Sh 
V@st) = we NK V (Gn) = we Ni(Na — na) ay 


Some particular cases of this formula are given in the following 
corollaries. ue 
Corollary 1 If the sampling fractions n}/N+ are negligible in all 


strata, 
Ne WSR 
yy (5.6) 
V(Gst) = ao Hh T 
where W, = N;/N is the stratum weight. This is the appropriate 
formula when finite population corrections can be ignored. With 
stratified sampling, there is in general no single finite population cor- 
rection factor, the factors entering individually into each stratum. 
Corollary 2 With proportional allocation, we substitute 
nNa 


Ni oe 


N 


in (5.4). The variance reduces to 


NaS (N — w 
Vou =o = =- C Sms? 6D 


In this case there is a single fpe, (N — n)/N. 
Corollary 3 If sampling is proportional and the variances in all 
strata have the same value, S.2, we obtain the simple result 
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Theorem 5.4 If Ê. = Nis is the estimate of the population total 


Y, then 
2 


5, 
V(Pa) = Naa — m) = (5.9) 
h 


This follows at once from theorem 5.3. 
Example. Table 5.1 shows the 1920 and 1930 numbers of inhabi- 
tants, in thousands, of 64 large cities in the United States. The data 


TABLE 5.1 Sizes or 64 crT1es (1N 1000's) 1n 1920 anv 1930 


1920 size (zri) 1930 size (yni) 
Stratum Stratum 
h=1 2 x 2 


797 | 314 172 121 900 364 209 113 
773 | 298 172 120 822 317 183 115 
748 | 296 163 119 781 328 163 123 
734 | 258 162 118 805 302 253 154 
588 | 256 161 118 670 288 232 140 
577 | 243 159 116 | 1238 291 260 119 
507 | 238 153 116 573 253 201 130 
507 | 237 144 113 634 291 147 127 
457 | 235 138 113 578 308 292 100 
438 | 235 138 110 487 272 164 107 
415 | 216 138 110 442 284 143 114 
401 | 208 138 108 451 255 169 111 
387 | 201 136 106 459 270 139 163 
381 192 132 104 464 214 170 116 
324 180 130 101 400 195 150 122 
315 | 179 126 100 366 260 143 134 


Nore. Cities are arranged in the same order in both years. 


Totals and sums of squares 


‘ 1920 1930 


Stratum] Yai) Lard) Lod Dn?) 


1 8,349 4,756,619 10,070 7,145,450 
2 7,941 1,474,871 9,498 2,141,720 
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were obtained by taking the cities which ranked from 5th to 68th in 
the United States in total number of inhabitants in 1920. The cities 
are arranged in 2 strata, the first containing the 16 largest cities and 
the second the remaining 48 cities. 

The total 1930 number of inhabitants in all 64 cities is to be esti- 
mated from a sample of size 24. Find the standard error of the esti- 
mated total for (i) a simple random sample, (ii) a stratified random 
sample with proportional allocation, (iii) a stratified random sample 
with 12 units drawn from each stratum. 

This population resembles the populations of many types of business 
enterprise in that some units—the large cities—contribute very sub- 
stantially to the total and display much greater variability than the 
remainder. 

The stratum totals and sums of squares are given under table 5.1. 
For the complete population in 1930, we find 


Y = 19,568; S? = 52,448 
The three estimates of Y are denoted by Pran, Fprop, and Pequat. 
i. For simple random sampling, 


N?S? (N — 2) _ (64)?(52,448) 


g = 
V(Vran) = F W rT 


40 
(=) = 5,594,453 
64 


from theorem 2.2, corollary 2. The standard error is 
o(Pran) = 2865 
ii. For the individual strata the variances are 
8,2 = 53,843: So” = 5581 


Note that the stratum with the largest cities has a variance nearly 


10 times that of the other stratum. 
In proportional allocation, we have nı = 6, nə = 18. From formula 


(5.7), multiplying by N’, we have 
N — n) 
V(Pprop) = ( 2 Si? 


$2 (16) (53,843) + (48) (5581)} = 1,882,293 


1 


o(Pprop) = 1372 
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iii. For ny = ns = 12 we use the general formula (5.9): 


SR 
VF egua) = 2 Na (Nr — m) — 
Nh 
16) (4) (53,843 3 
_ (16)( * a (48)¢ he = 1,090,827 


o(Lequat) = 1044 


In this example, equal sample sizes in the two strata are more pre- 
cise than proportional allocation. Both are greatly superior to simple 
random sampling. The 1920 data, not utilized here, will appear in 
later examples. 


5.4 The estimated variance and confidence limits. If a simple ran- 
dom sample is taken within each stratum, an unbiased estimate of S87 
is (from theorem 2.4) 

1 


2 2l! nh fort: 
Sh mi x (Yn — Jn)? (5.10) 


Hence we obtain 


Theorem 5.5 With stratified random sampling, an unbiased esti- 
mate of the variance of J: is 


Ia 2 
(Gat) = P) = — E Nala — mh) (6.11) 
N* i= my 


An alternative form for computing purposes is 


L Wes? L Wash? 


d= L = (5.12) 


h=1 Nha kat N 


The second term on the right represents the reduction due to the fpe. 

In order that this estimate can be computed, there must be at least 
two units drawn from every stratum. Estimation of the variance 
when stratification is carried to the point where only one unit is chosen 
per stratum is discussed in section 5.21. 

Corollary. In certain applications it is reasonable to suppose that 
Sr? has the same value in all strata. From the analysis of variance of 
the sample, a pooled estimate of this common variance is 


L n 
=D ww - 7? 


2 hal i=1 


SE eT 
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Since sampling is usually proportional in this situation, the estimated 
variance of 7; takes the simple form (from theorem 5.3, corollary 3) 


ae 
v(a) = “oa 


with (n — L) degrees of freedom. 
The formulas for confidence limits are as follows: 


Population mean: Fst  ts(Ysr) (5.18) 
Population total: Nast + tN s(Gs1) (5.14) 


These formulas assume that 7,: is normally distributed, and that 
8(Js:) is well determined, so that the multiplier ¢ can be read from 
tables of the normal distribution. 

If only a few degrees of freedom are provided by each stratum, the 
usual procedure for taking account of the sampling error attached to 
a quantity like s(Js) is to read the t-value from the tables of Student’s 
t instead of from the normal table. The distribution of s(Ẹs:) is in 
general too complex to allow a strict application of this method. An 
approximate method of assigning an effective number of degrees of 
freedom to s(ğs:) is as follows (Satterthwaite, 1946): 

We may write 
Na(Na — ma) 


Line 2 = 
(Ge) = we San’, Where fa = a 


The effective number of degrees of freedom ne is 


Es (DD Sasn?)? 
E Sest 


na — 1 


Ne (5.15) 


The value of ne always lies between the smallest of the values 
(n, — 1) and’ their sum. The approximation takes account of the 
fact that S,2 may vary from stratum to stratum, but it assumes that 
the values yj; are normally distributed, and is worth while only if this 


Condition appears to hold. 


5.5 Optimum allocation. The problem of allocation concerns the 
choice of the sample sizes n» in the respective strata. Theorem 5.6 
gives an important result established by Tschuprow (1923) and Ney- 


man (1934) 
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Theorem 5.6 In stratified random sampling, the variance of the 
estimated mean %, is smallest, for a fixed total size of sample, if the 
sample is allocated with n, proportional to NSh. 

Proof: The problem is to minimize 


_ ie Si? 
V@st) = = DL Na(Na — m) — 
N* 421 Nh 
subject to the restriction 
n Hn t+ etn =n 
We select the n, and the Lagrange multiplier à so as to minimize 
V(Gse) + (my + ng +--+ ny — n) 
1 Ni? 
=r — — Nr |) HA + n +--+ ny — 2) 
N Nha 


Differentiating with respect to na, we obtain the equation 


This gives 


To find the actual value of na, add (5.17) over the strata. Thus 


N.S) 
Xm =n= ZnS (5.18) 


Substitution for V^ into (5.17) gives 
NSn 
Sor NiSh Wh 


This result states that the sample size in a stratum should be propor- 
tional to the product of the size of the stratum and the standard devia- 
tion of the stratum, or in other words that the sampling fraction 
n/N; should be proportional to the standard deviation. Other things 
being equal, a larger sample is needed in a variable stratum. 

An expression for the minimum variance itself is obtained by sub- 
stituting the values of n, given by (5.19) into the general formula for 
V(Gs). This gives 


n, = (5.19) 


l ÈE NS 
Vinin (Get) = —— 


N? n N? 


D N82 (5.20) 
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5.6 Optimum allocation with varying costs. A more general approach 
is to consider optimum allocation for a specified total cost. For this 
we need a cost function, which expresses the cost of taking the sample 
in terms of the sample sizes ng. We shall consider only the simple 


function 
Cost = C = a + Denna (5.21) 


Within any stratum, the cost mounts directly with the size of sample, 
but the cost per unit, ca may vary from stratum to stratum. The 
symbol a represents an overhead cost. 

Theorem 5.7 With a cost function of the form (5.21), the variance 
of the estimated mean fs: is a minimum when n; is proportional to 


NaSn/ Ven. 


Proof: This is similar to that of theorem 5.6. We minimize 
V(Gse) + Ma + D cana) 
N; 
4 > ei = Na) S + Mat È crna) (5.22) 


~ Ne nh 
Differentiation with respect to n, gives the equation 
NES 
— 4 + ren = 0 
ny? 
i.e, s 
N; 
ay A (5.23) 


aD = Fo, 


Summing over all strata, we obtain 


NS, 
nyisa Ta (5.24) 
Finally, the ratio of (5.23) to (5.24) gives 
NiSh 
Me Ven (5.25) 
n NiSh 


L Va 


This theorem leads to the following rules of conduct. In a given 
stratum, take a larger sample if 
(i) the stratum is larger, 
(ii) the stratum is more variable, 
Gii) sampling is cheaper in the stratum. 
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One further step is needed to complete the allocation. Equation 
(5.25) gives the n, in terms of n, but we do not yet know what value 
n has. The solution depends on whether the sample is chosen so as to 
meet a specified total cost C or to give a specified variance V for gs. 
If cost is fixed, we substitute the optimum’ values of n» in the cost 
function (5.21) and solve for n. This gives 


n = 


D NSV en 


If V is fixed, we substitute the optimum n in the formula for V (Ja). 


We find 
sea WiSnh 
(XO WiSnV cn) (= n 


1 
V+ — WS. 
sae WS) 


n = 


where Wr = N/N. 


5.7 Relative precision of stratified random and simple random sam- 
pling. If intelligently used, stratification will nearly always result in 
a smaller variance for the estimated mean or total than is given by a 
comparable simple random sample. It is not true, however, that any 
stratified random sample gives a smaller variance than a simple ran- 
dom sample. If the values of the n, are far from optimum, stratified 
sampling may have a higher variance. In fact, as will be shown, even 
stratification with optimum allocation for fixed total sample size may 
give a higher variance, though this result appears to be an academic 
curiosity rather than something that is likely to happen in practice. 

In this section a comparison is made between simple random sam- 
pling and stratified random sampling with proportional and optimum 
allocation.* This comparison helps to show in what way the gain due 
to stratification is achieved. At first, the fpe is ignored, since this 
provides a clean-cut result. More general results are given later. 

The variances of the estimated means are denoted by Vran, Vprop 
and Veps respectively. 


Theorem 5.8 If terms in n/N» are ignored, 
Vopt < Varon s Vian (5.26) 
where the optimum allocation is for fixed n, i.e. with n, « NaSa. 


* Interesting discussions of this question are given by Armitage (1947) and 
Evans (1951). 
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Proof: If the fpe is ignored, 


82 
Vran = — 
NS 
Vprop = = [from equation (5.7), section 5.3] 
n. 
NSh)? 
Fyr = aa {from equation (5.20), section 5.5] 
n. 


From the standard algebraic identity for the analysis of variance of 
the stratified population, we have 


(N — 1S? = E E (n — Y 
roar 
= > oa Ya + D NY, - Y}? 
hf 
= > (Na — DSP + x Ni(¥n — Y} (5.27) 
h 


Since terms in 1/N, are negligible, this muy be written 
NS? = E NaS? + E Na(¥n — YY 
h h 


Hence " y 
S B N.S? xa L N(Y, — Y? 
Vran = m av nN 
DNM — Y)? 
= Virop + nN (5.28) 
By the definition of Vopn we must have Vprop > Vopt. Their dif- 
fer i ‘ 
ence is i abe (E NSi 
Vprop — Vopt = nN 2 Wh N 
a 
= — > N,(S; — 8)? (5.29) 
ree (Sh 


where § = J N,S;/N. From (5.29) and (5.28) 
X NalSs — 5)? E Ni, — Y) 
nN 


N (5.30) 


Vran = Vopt 


This result shows that there are two components to the decrease in 
Variance as we change from simple random sampling to optimum allo- 
Cation. The first component (term on the extreme right) comes from 
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the elimination of differences among the stratum means; the second 
(middle term on the right) from elimination of the effect of differences 
among the stratum standard deviations. The second component repre- 
sents the difference in variance between optimum and proportional 
allocation. 

More general result. If the fpe cannot be neglected, the same type 
of analysis leads to the result 


N-n) f 


Vran = Vpro 
prop + nN(N — 1) | 


> N.Y, — Y? - = EN- Nos) 
(5.31) 


It follows that simple random sampling gives a higher variance than 
proportional stratification unless 


ENY, — Y}? < = o> N — NaS (5.32) 


This case could happen, since the F, could all be identically equal. If 
any differences among the Y, exist, the inequality is unlikely to be 
satisfied except with small strata, since the left side is of order Ny, 
while the right is of order unity. 

The results reported here for optimum allocation are applicable to 
sampling practice only if optimum allocation can be achieved. The 
attempt to do so raises a number of problems that are discussed in 
succeeding sections. 


5.8 Effects of deviations from the optimum. For practical applica- 
tions the formula for the optimum allocation is not enough. We need 
to know also when the gains in precision over proportional allocation 
are likely to be substantial, and when they are likely to be small. 
Further, since in practice the S} will not be known, we can attain only 
an approximation to the optimum, so that we must know how much 
precision is lost if the allocation departs to some extent from the 
optimum. Some insight into these questions is obtained by consider- 
ing two strata, with the fpe ignored. 
In this case the general formula for the variance of 7, is 


1 NPS? P 
Er rit 7 ) 


nı neg 


Optimum allocation, for fixed sample size, is obtained when 
nN iS, nN2S2 
E Noop = 
N Sı + No62 ae NS, + NoS2 


Nop = 
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When these values are substituted into the general formula, we obtain 
(NSi + N2S2)” 
eet N?n 

The relative precision (RP) of a general allocation as compared with 
optimum allocation is given by 

Vopt _ (NiS; + No82)? 

V Es NS: N28 
m | Si 2 No | 


ny Ne 


RP = 


(NiU + Ne)? r 
NPU Ne) Gaa 
na al 


nı ne 


putting U = 8,/S2. 

We want to examine this quantity for a series of values of nı and 
mg in the neighborhood of the optimum. Departures of nı and ng 
from the optimum values can be expressed in terms of the ratio 


my [om MH 'NSı mNoSo _ mNo 
= no NoSg n2NyS; nN U 


g Ng/ N2, opt 
Hence 
NU 
Ne Ne 


Since n = nı +72, we can rewrite equation (5.33) in terms of 


71 /N2 as (iu + Na? 


er. 
2 U2 
( + “*) = "4. Nz?) 
na nı 


from (5.34) gives, after simplification, 


RP = 


Substitution for m/n2 
(NU + Ne)? 
RP = (NU + N)(NiU + N2) 


U, computations show that the relative 
moderate departures in the ratio of the 
ity. Consequently, results will be 
= Nə. Figure 5.1 shows the RP 


For given values of ¢ and 
Precision is not sensitive to 
stratum sizes Ni/N2 from equal 
given for the situation in which Ni 
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(in per cent) plotted against ¢ for three different values of 51/52 = 
U = 2, 4,8. The scale for ¢ is logarithmic. On each curve the value 
of y which represents proportional allocation is shown. 


Relative precision 


u=S,/Ss, 


Points denoted by P show the 
relative precisions with 
proportional allocation 


% % % 1 2 4 8 
= ratio of n, /n, to optimum value of n, /n, 


Figure 5.1 Loss of precision through departures from optimum allocation. 


Careful study of figure 5.1 suggests the following conclusions: 

i. The optimum is “flat,” in the sense that values of n,/ng any- 
where between 3 and twice the optimum ratio lead to a loss of precision 
which does not exceed 10 per cent for any of the three curves. 

ii. If S/S lies between 1 and 2, there seems little point in attempt- 
ing optimum allocation instead of proportional allocation. With S,/S2 
= 2, the loss in precision by using proportional allocation is at most 
10 per cent, and in practice is less, because we have only an approxi- 
mate optimum. (This conclusion requires modification if the sampling 
fraction is high and the fpc is no longer negligible.) 

iii. The relative precision of proportional allocation falls to 74 per 
ċent when S;/S2 = 4 and to 62 per cent when S,/S. = 8. Thus, when 
51/52 exceeds 2, we can have values of nı and ng that are quite far re- 
moved from the optimum, and still give a worth-while increase in pre- 
cision over proportional allocation. 

iv. At first sight it may appear surprising that the curve for U = 2 
is the lowest and that for U = 8 the highest. This result is a conse- 
quence of the abscissal variable which was chosen. 
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As a working rule, proportional allocation is usually to be recom- 
mended unless the expected gain in precision from optimum alloca- 
tion, as estimated in advance of taking the sample, exceeds 20 per cent. 


5.9 Determination of the allocation from previous data. Optimum 
allocation requires advance estimates of the stratum standard devia- 
tions S}. These may be obtainable from a previous survey or census. 

Example. The example in section 5.3 illustrates the situation where 
good previous data are at hand. In table 5.1 (p. 70), the 64 large cities 
are divided into 2 strata on the basis of their 1920 numbers of inhabi- 
tants. A stratified random sample of 24 cities is to be taken to estimate 
the 1930 total population. Optimum allocation for fixed sample size 
is desired. Since 1930 data would not be available when the 1930 sam- 
ple was being planned, we make the allocation from the 1920 values of 
Si. The calculations appear in table 5.2, with 1930 data included for 
comparison. 


TABLE 5.2 CALCULATION OF THE OPTIMUM ALLOCATION 


1920 Data 1930 Data 
Stratum Nj, 
Sr NiSh nh Sh NSh nh 
1 16 | 163.30 2612.80 11.56 | 232.04 3712.64 12.21 


2 48 | 58.55 2810.40 12.44 | 74.71 3586.08 11.79 


Totals 64 5423.20 24.00 7298.72 24.00 


The 1920 data indicate an nı of 11.56, as against a “true” optimum 
of 12.21 for the 1930 data. When rounded to integers, both sets of 
data give the same allocation—a sample size of 12 from each stratum. 
Note, incidentally, that the S» are smaller in 1920 than in 1930. The 


1920 data, although excellent for planning the allocation, give an 
Optimistic impression of the precision to be obtained in 1930. In prac- 
tice this factor should always be taken into consideration. If it were 
known that city sizes had increased between 1920 and 1930, some 
allowance should be made, in choosing the size of sample, for an accom- 
Panying iner i standard deviation. r f 
Ther nie ae loss in precision from the theoretical optimum 
because the n, have to be rounded to integers. This loss ee to 
be unimportant even with n 2s low as 6. The effect of this rounding 
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in the example is seen by substituting nı = 12.21, nə = 11.79, into the 
general formula s2 
h 
VPs) =EN(Na — m) — 
Nha 
for the variance of the estimated total. This gives a theoretical mini- 
mum variance of 1,090,157. The variance actually attained by a 12, 
12 allocation was worked out in the example in section 5.3 and was 
found to be 1,090,827. The difference is trivial. 

When an allocation is being planned, it is advisable to estimate the 
apparent gain in precision relative to proportional allocation. In 
section 5.8 we suggested that proportional allocation is to be preferred 
in view of its self-weighting properties, unless the apparent gain due to 
optimum allocation is at least 20 per cent. For the present data, the 
comparison with proportional allocation was made in a previous ex- 
ample (section 5.3). The relative precision turned out to be 1883/1091, 
or 173 per cent. Note that a calculation of this kind inevitably over- 
estimates the gain in precision relative to proportional allocation, be- 
cause it assumes that the Sa in the new survey will be in the same 
proportions, from stratum to stratum, as in the previous data which 
we used to compute the allocation. In the present example this over- 
estimation is negligible, because the 1920 allocation happened to be 
the same as the 1930 allocation. 


5.10 Effects of errors in the estimated S}. In the previous example, 
advance estimation of the S» proved completely successful for allocat- 
ing the sample. Where previous data are scanty or absent, there may 
be doubt whether the forecasts of the S} are good enough to use as the 
basis of an allocation. 

An examination of the effect of errors in the estimated S, has been 
made by Evans (1951). The analysis leads to a rule showing whether 
an “estimated optimum” allocation is likely to be profitable relative 
to proportional allocation. 

Suppose that allocation is- based on estimates s; of the Sa. That is, 


we use 
nN, ASA 


ENa (5:85) 


If the symbol (cv) denotes the coefficient of variation of the sp, 
assumed the same in all strata, Evans shows that the average increase 
in V (ga) due to errors in the s} is approximately 

(cv)? 


“nN? 


Nh 


V(Gst) — Vopt => {Co NS)? = >P NaS} 
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But, from equation (5.29), 


1 (NaS)? 
Via = Van mse — SEV} 
prop pt Iz RPh N 


Thus, on the average, the “optimum” stratification gives a smaller 
variance than propertional allocation, if 

ND Na — (0 Nin)? 

(2 NSh)? — DE (Nan)? 


For computation this result is more conveniently expressed in terms 
of the relative sizes of the strata, Wn = Na/N. 


o E Wè — (WS)? 
(cv) < 2 2 

(È Wash)? — Do (WSs) 
In practice, we should want (cv)? to be definitely less than this value, 


because, with (cv)? equal to this value, the chance that the supposed 


“optimum” is superior is only of the order of 50-50. 
Example. For the 1920 city size data (table 5.2, section 5.9) the 


relevant figures are shown in table 5.3. 


(cv)? < 


(5.36) 


TABLE 5.3 ESTIMATION OF ALLOWABLE ERROR IN THE Sh 


Stratum Wh Sh S WiSh WS? 
1 0.25 163.3 26,667 40.82 6,667 
2 0.75 58.6 3,428 43.95 2,571 


The calculation proceeds as follows: 
E WpS, = 9238: (X WSs)? = 7186: D (WSs)? = 3598 


— 7186 2 
9238 — 7186 _ 2052 _ 0.57, ie: (e) < 0.75 
7186 — 3598 3588 
a coefficient of variation of 0.75 could 
be allowed in the estimates of the Sa. This value is probably outside 
the limits of the approximation used. It does suggest that very poor 


estimates would suffice to make “optimum” allocation better than 
Proportional allocation, as is to be expected when the S differ markedly. 


(cv)? < 


‘The calculation indicates that 


ther examples are given by Evans. 2 a9 

Sukhatme (1935) has discussed the size of preliminary sample 
heeded for advance estimates of the Sx, when the choice is between 
Stratified and simple random sampling. Ile shows that Baer amall 
Mitial sample will usually give a high probability that the “optimum 
Allocation will have the smaller variance. Evans (1951) shows how to 
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compute the size of preliminary sample needed to make “optimum” 
allocation better, on the average, than proportional allocation. Note 
that these studies assume a normal population: when the population 
is not normal, the sizes of the preliminary samples will have to be in- 
creased “because the variance of s,” is sensitive to departures from 
normality (see section 2.9). 


5.11 Allocation with several items. Since the allocation that is best 
for one item will not in general be best for another, some compromise 
must be reached in a survey with numerous items. The first step is to 
reduce the items that will be considered in the allocation to a relatively 
small number that are thought to be most important. If good previ- 
ous data are available, we can then compute the optimum allocation 
for each item separately, and see to what extent there is disagreement. 
In a survey of a specialized type, the correlations among the items 
may be high, and the allocations may differ relatively little. 

Example. Data given by Jessen (1942) illustrate a farm survey of 
this kind. The state of Iowa was divided into five geographic regions, 
each denoted by its major agricultural enterprise. Suppose that these 
regions are to be used as strata in a survey on dairy farming. The 
three items of most interest are the number of cows milked per day, 
the number of gallons of milk per day, and the total annual cash re- 
ceipts from dairy products. From a survey made in 1938, the esti- 
mated standard deviations s} within strata are shown in table 5.4. We 


TABLE 5.4 STANDARD DEVIATIONS WITHIN STRATA 


s 
Na Sh Sh Receipts 
Stratum Wr, = W Cows Gallons for dairy 
milked of milk products 
($) 
Northeast dairy 0.197 4.6 ILY 332 
Cash grain 0.191 3.4 9.8 357 
Western livestock 0.219 3.3 7.0 246 
Southern pasture 0.184 2.8 6.5 173 
Eastern livestock 0.208 3.7 9.8 279 


shall assume, for the present, that the s, are the true standard devia- 
tions. The s, apply to a single farm. In table 5.5 the optimum alloca- 
tions are given for the individual items in a sample of 1000 farms. 
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TABLE 5.5 SAMPLE SIZES WITHIN STRATA 


Allocation 


Stratum Optimum for 

Propor- Average 
tional mM, 

Cows Gallons Receipts 


Northeast dairy 197 254 258 236 250 
Cash grain 191 182 209 246 212 
Western livestock 219 203 171 194 189 
Southern pasture 184 145 134 115 131 
Eastern livestock 208 216 228 209 218 


Since the ‘state contains over 200,000 farms, the fpe can be ignored. 


Thus for any item 
NW hSh 


we Ds W, Sh 


The individual optimum allocations differ only moderately from 
each other. With one exception, all three deviate in the same direc- 
tion from a proportional allocation. Thus, in the first stratum, propor- 
tional allocation suggests 197 farms, while the individual allocations 
lead to numbers between 236 and 258. The average of the optimum 
sample sizes for the three items, shown in the right-hand column, pro- 


vides a satisfactory compromise allocation. 
Table 5.6 shows the expected sampling variances of fs, as given by 


TABLE 5.6 EXPECTED VARIANCES OF THE ESTIMATED MEAN 


Type of allocation Cows Gallons Receipts 
Optimum 0.0127 0.0800 76.9 
Compromise 0.0128 0.0802 77.6 

0.0131 0.0837 80.9 


Proportional 


the individual optima, the compromise, and the proportional alloca- 
tions. The formulas are as follows: 


7 nS)? (Wass)? D> Wisi? 
(2, Wate) : Vcomp = 2% i Uprop = 


Mh n 


Vopt = 
n 


The formula for Ycomp comes from the general formula for V (Fs) in 


86 STRATIFIED RANDOM SAMPLING 5.11 


theorem 5.3, corollary 1: the denominator m; is the sample size given in 
the Ath stratum by the compromise allocation. 

The compromise allocation gives almost as precise results as if it 
were possible to use separate optimum allocations for each item. What 
is more noteworthy is that proportional allocation, though consistently 
poorer than the compromise, is only slightly less precise than the com- 
promise or the individual optima. Further, table 5.6 overestimates 
the precision of the optima and of the compromise, since these alloca- 
tions were made from estimated rather than from true variances. This 
result illustrates the important principle (previously discussed) that up 
to a point the variance of the estimate does not increase much even if 
allocation departs quite substantially from the optimum. From the 
comparison in table 5.6, proportional allocation would be the recom- 
mended choice. Had the S+ differed more markedly among strata, the 
compromise allocation might have been very satisfactory. 

In surveys which cover a wider range of items, the allocations may 
differ more violently. The best compromise is then not so obvious. 
Although proportional allocation is often used in this situation, it may 
be possible to find a compromise which is superior. As a preliminary 
we need some criterion for a “best”? compromise. One possibility is to 
minimize the sum of the variances for the items, each divided by its 
optimum variance. If the fpe is ignored, this amounts to finding nj, 
which minimize 

(WiSn)? 


Nha 


(È WS) 


n 


XY 


where J’ denotes summation over the items. It remains to be seen, 
however, whether situations are common, in surveys of wide scope, in 
which the gain over proportional allocation is sufficient to offset the 
advantage of the “self-weighting” feature in the latter. 


5.12 Allocation requiring more than 100 per cent sampling. In the 
planning of an allocation it may happen that the formula for the opti- 
mum produces an n; in some stratum that is larger than the correspond- 
ing N}. Consider the example on city sizes in section 5.9. A sample 
of 24 cities, distributed between 2 strata, called for 12 cities out of 16 
in the first stratum and 12 out of 48 in the second. Had the sample 
size been 48, the allocation would demand 24 cities out of 16 in the 
first stratum. The best that can be done is to take all cities in the 
stratum, leaving 32 cities for the second stratum instead of the 24 
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postulated by the formula. This problem arises only when the overall 
sampling fraction is substantial, and one stratum is much more variable 
than the others. It has occurred in practice on several occasions. 

Care must be taken to use the correct formula in predicting the 
expected variance from this allocation, or in comparing the allocation 
with others. The general formulas in theorem 5.3, section 5.3, are 
appropriate if the n} given by the revised optimum allocation are sub- 
stituted. Formula (5.20) for the minimum variance for fixed n, 

HSN 

1 (© NiSh) ey ENS 


Pain) = -= 
Vmin) = 58 a N? 


is no longer correct. If stratum 1 is the only stratum in which over- 
sampling is indicated, the correct formula for Vin becomes 
L a I 


Penhi ee = "NS 5.37 
vin (Fai) N? n=Ni me = (30) 


where ys denotes summation over all strata except stratum 1. 


5.13 Estimation of sample size. This question might have been dis- 
cussed earlier in this chapter. On the other hand, since the formulas 
for the variances of the estimated mean and total contain both the na 
and the Sa, sample size may not be decided in practice until advance 
estimates of the Sa are available and some decision about, allocation 
has been made. In the following discussion we assume that the pre- 
cision desired is stated in terms cither of the margin of error, or of the 
half-width of confidence interval, d. For estimation of the population 


mean, the basic equation to be satisfied is 
@ = EV Gu) (5.38) 


where lis the normal deviate corresponding to the desired confidence 
probability, Since d and ¢ enter the solution only in the form @?/?, 
we shall write V = 42/02, where V is the desired variance in the sample 
estimate. i 

Let sn be the estimate of Sn and let na = Win, where the w, have 
been chosen. In these terms the expected V (fs) is (from theorem 5.3, 
section 5 
ection 5.3) WE 


Expected V(gs) = We >D 


1 
ED NG? 
Nh N? È Nass 
ie., wW, 2 1 
1 iSi 
a S E) 
n Wh N 
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with Wa = Na/N. This gives, as a general formula for n, 
Wese 
as panes 
Wh 
poea (5.40) 


1 
V+ = > Wasi? 


If the fpe is ignored we have, as a first approximation, 


1 Wis? 
Ny = kat eal (5.41) 
V Wh 
If no/N is not negligible, we may calculate n as 
n 
n = — (5.42) 


1 
1+— Dd Wais? 
Br NV E Wasr 


In particular cases the formulas take various forms which may be 
more convenient for computation. A few are given. 
Presumed optimum allocation (for fixed n): wr œ Wash. 
(È Wasa)? 
n= Eoo (5.43) 
y + N > Wash? 


Proportional allocation: w, = Wn = N,/N. ; 


> Wasi? no 
s e je 
7 me (5.44) 
N 


No 


If V is the desired variance for the estimated population total, the 
principal formulas become as follows: 
G l P 
enera mar 


Wh 
=y HE Nasi? 
Presumed optimum (for fixed n): 

M (E Nasr)? 
V+ > Mis? 


n (5.45) 


(5.46) 


Proportional: 
No 


N 
1) y D Ni: n= (5.47) 


Tee 


alé 
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Example. This example is taken from a paper by Cornell (1947), 
which describes a sample of United States colleges and universities 
drawn in 1946 by the U. S. Office of Education, in order to estimate 
enrollments for the 1946-47 academic year. The illustration given 
here is for the population of 196 teachers’ colleges and normal schools. 
These were arranged in seven strata, of which one small stratum will 
be ignored. The first five strata were constructed by size of institu- 
tion; the sixth contained colleges for women only. Estimates s+ of the 
Sn were computed from results for the 1943-44 academic year. An 
“optimum” stratification based on these s} was employed. 

The objective was a coefficient of variation of 5 per cent in the 
estimated total enrollment. In 1943 the total enrollment for this group 
of colleges was 56,472. Thus the desired standard error is 


(0.05) (56,472) = 2824 


so that the desired variance is 


V = (2824)? = 7,974,976 


It may be objected that enrollments will presumably be greater in 
1946 than in 1943, and that allowance should be made for this increase. 
Actually, the calculation assumes only that the cv per college remains 
the same in 1943 and 1946—an assumption which may not be un- 


reasonable. 
Table 5.7 shows the values of Nn, sa, and Nasa, which were known 


before determining ^. 


TABLE 5.7 DATA FOR ESTIMATING SAMPLE SIZE 


Stratum Nh SA Nish nh 
1 13 325 4,225 9 

2 18 190 3,420 cf 

3 26 189 4,914 10 

4 42 82 3,444 7 

5 73 86 6,278 13 

6 24 190 4,560 10 
Totals 196 26,841 56 


The appropriate formula for n is (5.46), which applies to an “opti- 
mum” allocation for estimating a total. With only 196 units in this 
population, it is improbable that the fpe will be negligible. However, 
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for purposes of illustration, a first approximation ignoring the fpe will 
be sought. This is 


_ (È Nasa)? _ (26,841)? 
V 7,974,976 


No 90.34 


Adjustment is obviously needed. For the correct n in (5.46), we have- 


oe 90.34 
peice =p 
3 3 J Eà 
1+—)0 Nas? c 
+7 UM 7,974,976 


A sample size of 56 was chosen.* The ny for individual strata appear 
in the right-hand column of table 5.7. 


5.14 Stratified sampling for proportions. If we wish to estimate the 
proportion of units in the population which fall into some defined class 
C, the ideal stratification is attained if we can place in the first stratum 
every unit which falls in C, and in the second every unit which does 
not. Failing this, we try to construct strata such that the proportion 


in class C varies as much as possible from stratum to stratum. 
Let 


be the proportions of units in C in the hth stratum and in the sample 
from that stratum, respectively. For the proportion in the whole 
population, the estimate appropriate to stratified random sampling is 


Nip, 
Ps = 5 a (5.48) 


Theorem 5.9 With stratified random sampling, the variance of ps: is 


1 > Ni?(Nn — mn) Phn 


N? (Na—1) A 


V (pst) = (5.49) 


Proof: This is a particular case of the general theorem for the vari- 
ance of the estimated mean. From theorem 5.3 


1 Sr? 
V (ga) = ye NAN a — m) — 


Th 


* The arithmetical results differ slightly from those given by Cornell (1947). 
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Let yni be a variate which has the value 1 when the unit is in C, and 
zero otherwise. In section 3.2 it was shown that for this variate 


This gives the result. 
Note. In practically all applications, even if the fpc is not negligible, 
terms in 1/N, will be negligible, and the slightly simpler formula 


1 Phr 
V(Dst) = M D Nala — m) F (5.50) 
can be used. 
Corollary 1 When the fpe can be ignored, 
1 PiQh PQe 
=— NY = 2 Ws 5.51 
V (pst) x? EN r DW os (5.51) 
Corollary 2 With proportional allocation, 
(N—n) 1 Ni PiQr 
= 5.52 
VOW) y a — 0) sie 
N —n)1 
=, Mi D WarPhQh (5.53) 


n 
Tor a sample estimate of the variance, we may substitute Pa, qa for the 
unknown Ph, Qù in any of the formulas above. This does not give an un- 
biased estimate, but is adequate for the calculation of confidence limits. 
The best choice of the na in order to minimize V (ps:) follows from 
the general theory in sections 5.5 and 5.6. 
Minimum variance for fixed total sample size. 


ny, œ Ni mE i V PiQn `=. Ni V PiQn 
y A= 


> Na V Prga 
ppa Ss Nav PiQh 


Minimum variance for fixed cost, where Cost = a + D> ennn. 


Thus 


The value óf n is found as in section 5.6. 
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Optimum versus proportional allocation. If the cost of sampling per 
unit is the same in all strata, and if all P} lie between 0.1 and 0.9, there 
appears little to be gained by optimum allocation over proportional 
allocation. 

Suppose that the fpc is ignored and that optimum allocation is made 
for fixed n, i.e. with na œ Na V PaQa. It will be found that 


(SX Wa V PQ)? | V = D WaPhQh 


opt ~ ° prop 
n n 


so that the relative precision of proportional to optimum allocation is 


Vomi _ (È WaV Pagh}? 
Torp D WaPrQh 


If all Pa lie between the two values Po and (1 — Po) ,we are inter- 
ested in the smallest value which the relative precision takes. For 
simplicity, we consider two strata of equal size (W, = We). The 
minimum relative precision is attained when P, = $ and Pz = Po. 
The relative precision then becomes 


Vont _ (0.5 + VPoQ )? 
Vorop  2(0.25 + PoQ) 


Some values of this function are given in table 5.8. Even with Po equal 
to 0.1, or as high as 0.9, the relative precision is 94 per cent. In most 
cases the simplicity and the self-weighting feature of proportional 
stratification more than compensate for this slight loss in precision. 


TABLI 5.8 RELATIVE PRECISION OF PROPORTIONAL TO OPTIMUM ALLOCATION 


Po 0.4 or 0.6 0.3 or 0.7 0.2 or 0.8 0.1 or 0.9 0.05 or 0.95 


RP(%) 100.0 99.8 98.8 94.1 86.6 


The limitations of the example should be noted. It does not take 
account of differential costs of sampling in different strata. There are 
surveys where the P} are very small, but range from, say, 0.001 to 0.05 
in different strata. Here there would be a more substantial gain from 
optimum stratification. 


’ 
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Formulas for the determinati 


ion of sample size can be deduced from 


the more general formulas in section 5.13. Let V be the desired vari- 
ance in the estimate of the proportion P for the whole population. 
The formulas for the two principal types of allocation are as follows: 


Proportional: 
2 Wipndh no 
no = > n= (5.56) 
ie i+ no 
N 
Presumed optimum: 
(DO Wa page)? no P 
no = <<a a 2 m= (5.57) 


1 
1+—2DW 
F NV >D APhUA 


where no is the first approximation, which ignores the fpc, and n is the 
corrected value taking account of the fpe. In the development of 
these formulas, the factor Na/ (Na — 1) has been taken as unity. 

All results in this section apply to the estimate of a proportion. If it 
is preferable to think in terms of percentages, the same formulas apply 
if Pr, Qn, V, etc., are expressed as percentages. For the estimation of 
the total number in the population which lie in class C, i.e. of NP, all 
variances are multiplied by N°. 


5.15 The construction of strata. This topic raises a number of ques- 
tions. What is the best characteristic for the construction of strata? 
How many strata should there be? How should the boundaries be- 
tween the strata be determined? Much investigation remains to be 
done on these questions, although some answers in general terms can 


be given. 


For a single item or variable, it seems obvious that the best charac- 


teristic for stratification, if we 
* tribution of the variable itself. 


could possess it, is the frequency dis- 
The next best thing is the frequency 


distribution of some other quantity that is highly correlated with the 


variable, such as the values of 


the variable itself in a previous census. 


So far as the number of strata is concerned, the answer is at first 
sight the more the merrier. This follows from the general result about 


the relative precision of stratifi 


ed random to simple random sampling. 


In section 5.7 it was shown that if the fpe can be ignored, 
Vopt < Vprop Ê Vran 


It was suggested that these results remain valid, in nearly all cases, 


even if the fpe is not negligible. 


Suppose that a stratum already in 
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existence is divided into two new strata. For this stratum the effect 
is to replace simple random sampling by stratified sampling, which 
should result in a lower variance for the estimated stratum mean, and 
hence for the mean of the whole population. ‘Thus the process of 
constructing new strata by the subdivision of old strata should result, 
in a series of decreasing variances in the estimates. 

Multiplication of strata is, however, attended by some disadvan- 
tages. Unless allocation is proportional, the number of weighting fac- 
tors W, which appear in the tabulations increases, as do the number 
of within-stratum variances to be estimated, both for allocation and 
for finding the standard error of fs. The number of degrees of free- 
dom available for the estimated standard error decreases until with 
one unit per stratum the formula for the standard error cannot be used 
at all. If an increase in the number of strata is under consideration, 
we want to be assured that the ensuing gain in precision is sufficient 
to repay us for these complications. 

Consider a process in which each stratum is divided into halves to 
form new strata. The number of strata becomes successively 2, 4, 8, 
16, ete. If we can use the frequency distribution of Yi itself for the 
construction of strata, and if the distribution of 1 i is continuous, every 
stage in the process produces a substantial reduction in the variance 
of the estimate. When the number of strata becomes large, each 
doubling reduces the variance to approximately one-quarter of the 
previous value. 

To show this, suppose that before subdivision a typical stratum 
consists of all values of y; between a and b. If there are already many 
strata, the distribution of y; between a and b will be approximately 
rectangular. The variance of this rectangular distribution is known 
by theory to be (b — a)?/12. When we halve the stratum, we produce 
two approximately rectangular distributions, each with range (b — a)/2 
and variance (b — a)?/48. With proportional allocation, ignoring the 


fpe, 
EWS? 
V@Get) = r 


Hence the contribution of this stratum to V (g+) is reduced by the sub- 
division from 
Wi(b — a)? 
12n 
to 
W, (b — a)? Wi(b— a)? _ Wilb — a)? 


2 48n 2 48n 48n 
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As an illustration, the values of V(Ẹs:) were calculated for 2, 4, 8, 
and 16 strata for a variate with the frequency function f(y;) = e™™ 
(y: > 0). All strata in a given subdivision were of the same size, 
although this is not the best method of division for maximum preci- 
sion. Both proportional and optimum allocation (for fixed sample 
size) were included. The values of nV(g,:) and the ratios of successive 
values appear in table 5.9. 


TABLI 5.9 VALUES or nV (ja) FOR STRATA FORMED FROM e~“* 


Optimum Proportional 
Number 
obeita Ratio to Ratio to 
nV à nV a 
preceding preceding 
1 1 1 
2 0.27722 0.277 0.40747 0.407 
4 0.06690 0.241 0.11822 0.290 
8 0.01618 0.242 0.03079 0.260 
16 0.00397 0.245 0.00778 0.253 


Vor proportional allocation, the variance does not at first decrease 
to one-quarter with a doubling of the number of strata, because the 
stratum which extends to infinity has a variance substantially larger 
than the others. However, the ratio quickly settles close to one-quar- 
ter for both types of stratification. 

In practice, the stratification is not based on the frequency distribu- 
tion of y;. Suppose that it is based on the frequency distribution of a 
variate z; which is correlated with y+ Let y: = 2; + ei where the 
variates z; and c; are independently distributed. Strata formed by 
cutting up the frequency distribution of 2; will not affect the variance 
of e; When there are many strata, the division of a stratum into 
halves will reduce the contribution of this stratum from 


Wb — a)? r Wro 
12n n 


to 
Wi(b — a)? m Wrcen 


48n n 
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As soon as (b — a) is small enough so that the term in ceh” dominates, 
further increase in the number of strata produces only a trivial reduc- 
tion in variance. 

To summarize, subdivision of strata in practice sooner or later re- 
sults in practically no further increase in precision. If u; has only a 
moderate correlation with y;, the point may be reached with a small 
number of strata. 

If strata are formed by cutting up a frequency distribution, there 
remains the question: What are the best points of division? 

Rules have been developed by Dalenius (1950) and Dalenius and 
Gurney (1951) for division under proportional and optimum allocation. 
One result will be quoted. If the variate za; on which the division is 
based is linearly related to the variate y}; which is to be measured in 
the survey, the division point 2,’ between stratum h and stratum 
(h + 1) should satisfy the equation 


en! = Zh + Brp 


where Za, Z,4: are the mean values of z»: in the two strata. This is 
the optimum rule for proportional allocation. The division points 2’, 
29', +++, 2p_1’ ave found by trial and error, the number of strata L 
being assumed chosen in advance. 

A stratification that is effective for one variable may not be so for 
another. In surveys which cover a range of items, some compromise 
criterion for the construction of strata must be found. Bases of 
stratification for economic items have been discussed by Stephan 
(1941) and Hagood and Bernert (1945), and for farm items by King 
and McCarty (1941). What is wanted is some variable that has high 
correlations with all the principal items in the survey. 


5.16 Proximity as a basis for stratification. In surveys which cover 
a geographic region, adjacent units are often more alike than units 
that are far apart. Farmers living near one another tend to have a 
similar soil type, a similar kind of farming, and a similar access to 
markets. People in the same part of a town tend to be of similar 
economic level and to have certain things in common in their attitudes 
and tastes. These remarks do not imply that the similarities are 
strong, but merely that they exist. 

This phenomenon can be utilized by constructing strata which 
consist of compact geographic regions. For a number of typical farm 
economic items, data on the effectiveness of geographic stratification 
within states in the United States are shown in table 5.10. The data 
were presented by Jessen (1942) and Jessen and Houseman (1944). 
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Four sizes of stratum are represented—the township, the county, the 
“type of farming” area, and the state. Some idea of the relative sizes 
of the strata is obtained from the fact that in Iowa there are about 1600 
townships, 100 counties, and 5 areas. 


TABLE 5.10 RELATIVE PRECISION OF DIFFERENT KINDS OF GEOGRAPHIC 
STRATIFICATION (IN PER CENT) 


Stratum 
No. 
State of 
g Type of 
ioms Town- County farming State 
- ship 
area 
Iowa, 1938 18 115 100 96 91 
Iowa, 1939 19 121 100 97 91 
Florida, 1942 
Citrus fruit area 14 144 100 
Truck farming area 15 111 100 aa 
California, 1942 17 113 100 97 


The data shown are the average relative precisions of the estimates 
of the means, averaged over the numbers of items shown in the table. 
The county is taken as a standard in each case. 

As appears to be typical of geographic stratification, the increases 
in precision from increased numbers of strata are moderate rather than 
large. This indicates that the similarities referred to above are weak. 


5.17 Estimation from a sample of the gain due to stratification. 
When a stratified random sample has been taken, it may be of interest, 
onduct of future surveys, to appraise the gain in 
precision relative to simple random sampling. 

The data available from the sample are the values of Nr, a, Ju 
From section 5.4, the estimated variance of the weighted 


as a guide to the ¢ 


and s. j 
mean from the stratified sample is 
W, ATE Was? 
va) = L— T 


where Wa = N/N. È i r 
The problem is to compare this variance with an estimate of the 


variance of the mean that would have been obtained from a simple 
random sample. A procedure sometimes used is to calculate the 
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familiar mean square derivation from the sample mean, 


"E DX (uaz — 9)? 
n—I 
where the strata are ignored. This is taken as an estimate of the vari- 
ance per unit for a simple random sample. This method works well 
enough if the allocation is proportional, since a simple random sample 
distributes itself approximately proportionally among strata. But if 
an allocation far from proportional has been adopted, the sample 
actually taken does not resemble a simple random sample, and this 
s? may be a poor estimate. A general procedure will now be given. 
The true variance of the mean of a simple random sample is 


(N — n) g (N — n) E (Ni — 1)8,? + bJ Ni(Y 1, — = 
nN nN (N — 1) 


Vran = 


(5.58) 
by an algebraic identity for S?. 

There is no difficulty in obtaining an estimate of the first term inside 
the bracket. The second term requires some detailed investigation. 
For estimating 2) N,(Yn — F)? it is natural to try > Na(Gn — Gu)?- 
This quantity turns out to be an overestimate which needs adjustment. 
We may write 


E Nar — Gu)? = E Nain — F) + (Gn — Yn) — Gor — Y)}? 


We now expand and take the average over all possible samples. It 
may be verified that the average of each of the two cross-product, 
terms involving (Y; — Y) vanishes. This gives 


E D Nin — Ge? = D NAYa — Y) + ENG — Yn)? 
+E E Niu — Y)? — 2B $ Nalin — Yi) — Y) (5.59) 
SP NilGn — Yn) Get — Y) = Ngu — Y) 
by the definitions of fs and Y. Thus the last two terms in (5.59) coa- 
lesce to give A 
Na(Na — ma) Sa” 
N nh 


since this expression is N times the variance of fs. For the second 
term on the right in (5.59), 
Ni(Na — mx) Si? 


ED Nin- Yn)? =o 
Ni Nh 


because within each stratum 7, is the mean of a simple random sample. 


But 


- EN Gu - Y} = -5 


S2 
=E (Nr - m) 
Nh 
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Hence, 
ES Nan — Get)? = > Nir- Xy 

S D Na(Na — nr) Si? 


Ni — na) 
Xu é i Nh N Nh 


The sum of the last tivo terms on the right is always positive, so that 
y Ni(jn — Tat)? gives an overestimate. It follows that an unbiased 


estimate of X M, (Y, — Y)? is 
s o D Na(Na — m) s? 
Q = E Nalin — Ge)? — E (Na — m) — + = a 


nh Nh 


Finally, by substituting this estimate in equation (5.58), we obtain as 
an unbiased estimate of Vran, 

N-n Nn — 1)? + 
( ) jest n — 1)Sa 2] (5.00) 
nN N-1 


Vran = 


For computing purposes it is convenient to express this in terms of the 
relative stratum sizes Wa = Na/N. When the value of Q is inserted 
in (5.60), we find, after a little cancellation, 


(N —n) | Wisn 
an = ———_ Wasi — 
n(N — 1) È Wass 2 Nh 
Ws Wasi” 
+> = = 2 Task +> Wi? -— Co wan] 
Nh 3 


This expression is unattractive to compute. In nearly all applications, 
some simplifications can be utilized. Two will be given. 
N > 50. This will hold for almost all populations. The fourth term 


inside the bracket may be omitted, since it equals the first term, divided 


by N. We obtain 


TEE [ Em-E 
nN 


Wisi? 


Nh 


y, 2 A 
$ g! a Wie? -E mn] (5.61) 


Nh 


All n, > 50. The second and third terms inside the bracket may 


be dropped, to give 
N =n) S ye + E W? — (E Wa) (6.62) 


Uran = 
nN 


100 STRATIFIED RANDOM SAMPLING 5.17 


Example. The calculations will be illustrated from the first three 
strata in the sample of teachers’ colleges (section 5.13). Data from 
the 1946 sample appear in table 5.11. The means represent enroll- 
ment per college, in thousands. 


TABLE 5.11 Basic DATA FROM A STRATIFIED SAMPLE 


Stratum Na Nh Ün sè 
1 13 9 2.200 1.615 
2 18 7 1.638 0.063 
3 26 10 0.992 0.077 
Totals 57 26 


The sample is so small that expression (5.61) for vran will be used. 
The supplementary calculations are given in table 5.12. 


TABLE 5.12 ARRANGEMENT OF CALCULATIONS 


Stratum Wh Was? Wrsn?/mn Wr2sn2/nn Wrin 
1 0.228 0.36822 0.04091 0.00933 0.50160 
2 0.316 0.01991 0.00284 0.00090 0.51761 
3 0.456 0.03511 0.00351 0.00160 0.45235 


Totals 1.000 0.42324 0.04726 0.01183 1.47156 


The formulas work out as follows: 


wW. 2,2 Wis 2 
w= SO a SEW. 0:0188 = 0.00743 = 0.0008 
Nh N 
31 
Vran = —— [0.4232 — 0.0473 + 0.0118 + 2.4000 — 2.1655] = 0.0130 
(57) (26) 


Stratification appears to have reduced the variance to about one- 
third of the value for a simple random sample. 

Proportional allocation. An estimate vran that is usually adequate is 
obtained from the sum of squares of deviations of the sample values 
from their mean, for 


ga LD i — 9)? 


“=l 
7,)2 
—— [Zim do? +E mp? Ba A 
n 


by the usual identity in the analysis of variance. If terms in 1/n, are 


aS 
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negligible, this is equivalent to 
E Wasi? + Wage? — (Wagan)? 


since W, = m,/n. This in turn equals the quantity inside the bracket 
in equation (5.62). Thus the expression 


(N —n)s* 
me 
is satisfactory if allocation is proportional and terms in 1/nq are 


negligible. 

Proportional allocation with Sx constant. This case is assumed to 
hold in numerous applications: e.g. in sampling an agricultural field, 
where the strata are subdivisions of the field. The constant value of 
8,2, say Sy, is estimated by the pooled mean square within strata, 


Su2. For v(s:) = vee We have from equation (5.8) 


N = nN Sw 


For vran, we assume that terms in 1/N are negligible, but not terms 
in 1/np, since sometimes there are only a few units per stratum in the 
sample. Thus we start from equation (5.61), inserting, however, the 
pooled estimate sw? in place of the individual s4”. This gives 


å War Wè 
= vajz = Tn OE oy? 
Nn Nh Nh 
+E W -E w] 


Since n, = nWn, the coefficient of sw” may be written 


AI oi at 
Eves i 


where L is the number of strata. Hence 


Res {(n-L+ 1)su? +L mG — Got)"} 


Yvan = EN 
ran N 


ran 


n 


from an analysis of variance of the 


These quantities may be computed ; 
as shown in table 5.13. 


sample data into two components, 


TABLE 5.13 ANALYSIS OF VARIANCE OF THE SAMPLE DATA 


Component df J ms acre 
Between strata (L — 1) B= Lm — u) L — 1) 
Within strata n — L) W = 8u 
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It follows that 


N—n\ 1 i 
Vran = N \ Sle - 1)B+ (n—- L+1)W! 


Example. In sampling an agricultural field experiment for estimat- 
ing the number of wireworms per plot, the 25 plots were divided into 
halves. From each half 3 random samples of soil were taken with a 
small boring tool. The sample was a volume of soil 9 in. square to a 
depth of 5 in. The combined analysis of variance of numbers of wire- 
worms for the 25 plots was as follows: 


df ms 
Between strata (half-plots) 25 B = 90.76 
Within strata 100 W = 38.44 


The conditions are slightly different from those in the theory pre- 
sented above. Each plot represents a separate population, divided 
into 2strata. Thus = 6, na = 3, L = 2 for each plot. The analysis 
of variance gives the combined results for 25 stratified samples of this 
type. The fpe need not be included. 


38.44 
Vs = atin = 6.41 


Vran = gelB + 5W] = yg(90.76 + 5(38.44)] = 7.86 
F Sieg 7.8 
Relative precision = —— = 1.23 
6.41 


Stratification into halves increased the precision by slightly under 
one-fourth. 


5.18 Effects of errors in the stratum sizes. Dor a desirable type of 
stratification, the stratum totals Na may not be known exactly, being 
derived from a census that is out of date or from a large sample. 
Definite statements about the consequences of basing a stratification 
upon erroneous weights cannot be made without considering particular 
cases. A few conclusions of a general nature can be drawn. The fpe 
will be ignored. 

Instead of the true stratum proportions W+, we have estimates wa. 
The sample estimated mean is )> wyj,. This estimate is biased. Its 
mean value in repeated sampling is + wY». The bias amounts to 
> (wa — Wy) Yn. Consequently, the error variance of this estimate 
contains two components: the variance about the mean of the estimate, 
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and the square of the bias. If “optimum” allocation is used (with the 
Nj, replaced by their estimates), the first component is (D5 waSn)?/n. 
The total variance is 
(È wrSn)? 
n 


V@s) = + (È (wn — Wa) Ya)? (5.63) 

A more general form of this expression was given by Stephan (1941). 
As he points out, the first term in (5.63) will usually not differ greatly 
from the variance with the correct weights—the two are exactly the 
same if the variance is the same in all strata. The loss of precision 
from incorrect weights thus depends mainly on the size of the bias, 
which in individual cases may be small or large. For any given set of 
erroneous weights, the loss varies with the size of sample, because the 
“bias” component of the total variance is independent of the size of 
sample. With increasing sample size, a stage is reached where the 
“bias” term predominates, and where the stratification is less accurate 
than simple random sampling. 

This discussion does not help in considering whether to stratify in a 
survey where the weights are known to be in error, because the size of 
the bias cannot be predicted. Occasionally a standard error can be 
attached to the estimate of each Na, from knowledge of the process by 
which they were estimated. If the estimates of the Na are independ- 
ent, and independent of the Ja; the average value of the bias component 
of the total variance of Js: is roughly (Cochran, 1939) 

E (ha — YV Na) 
N? 
where V(N;) is the variance of the estimate of Na. This quantity 
measures the expected inerease in variance of J: due to errors in the Ny. 

King, McCarty, and McPeek (1942) applied this formula in research 
directed towards the estimation of yield per acre, protein, and test 
weight in the U. S. wheat belt. They discussed the utility of stratifica- 
tion by districts within each state. The total acreages Va for each 
district were themselves estimated by a sample survey, so that some 


knowledge of the V (Na) was available. 
This problem arises in the technique known as double sampling 


(section 12.2). It will be shown that if the W, are estimated by taking 
a large random sample of size n’ (n! > n), and noting how many fall in 
each stratum, then the increase in V (fs) is approximately 


> wih, — Y)? 


n 
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5.19 Case where the strata cannot be identified in advance. Some- 
times the stratum to which a unit belongs is not known until the data 
have been obtained from the unit. In a survey on political attitudes 
it may be useful to stratify according to the individual’s voting be- 
havior at the last election, which is not learned until the individual 
has been interviewed. A similar situation may occur when stratifica- 
tion is based on factors such as income, occupation, and religious 
affiliation. Of course, in such cases the stratum sizes Na} may not be 
known exactly: we will, however, assume that they are known. 

One procedure is to take a simple random sample of size n. When 
the sample data have been collected, the units are assigned to the 
strata by means of the information obtained about them. If ğa is the 
mean of the units that fall in the hth stratum, the estimate is 


en E Niin 
N 


In other words we use the ¿rue stratum sizes as weights to obtain a 
weighted mean, instead of taking the unweighted mean of the sample. 

If the sample is reasonably large, this technique is almost as precise 
as proportional stratified sampling. Let m, be’the number in the sam- 
ple that fall in the hth stratum, where m, will vary from sample to 
sample. For samples in which the m, are fixed, 


Yw 


1 Sr? 
V@w) = ye DN? ae 


where the fpe is ignored. — 

The average value of this quantity in repeated sampling must now 
be calculated. This procedure requires a little care, since one or more 
of the m, could be zero. If this occurred, we would have to combine 
two or more strata before making the estimate. The result would be 
a less accurate stratification and a higher variance for 7». With in- 
creasing n the probability that any my, is zero is so small that the con- 
tribution to the variance from this source is negligible. (This state- 
ment is given without proof.) 

If the case where m, is zero is ignored, Stephan (1945) has shown 
that to terms of order n~? 


i se recs 
6 ee a ee 
My mW, Wr WE 
where Wa = N}/N. Hence 
1 
BIV Gu)} = — È WS)? + terms in (n7?) 
n 
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The leading term is the variance obtained with proportional stratified 
sampling. 


5.20 Quota sampling. Another method that is used in this situation 
is to decide in advance the na that are wanted from each stratum and 
to instruct the enumerator to continue sampling until the necessary 
“quota” has been obtained in each stratum. If the enumerator ini- 
tially chooses units at random, rejecting those that are not needed, 
this method is equivalent to stratified random sampling. In the later 
stages of sampling, this may require considerable work on the part of 
the enumerator, since most of the units that are contacted may fall in 
strata where the quota has already been met. 

As this method is used in practice by a number of agencies, the 
enumerator does not select units at random. Instead, he takes ad- 
vantage of any information that enables the quota to be filled quickly 
(such as that rich people seldom live in slums). The object is to gain 
the benefits of stratification without the high field costs that might be 
incurred in an attempt to select units at random. Varying amounts 
of latitude are permitted to the enumerators. 

Sampling theory cannot be applied to quota methods which contain 
no element of probability sampling. Information about the precision 
of such methods is obtained only when a comparison is possible with a 
census or with another sample for which confidence limits can be 


computed. 


5.21 Stratification with one unit per stratum. If the population is 
highly variable and many effective criteria for stratification are known, 
stratification may be carried to the point where the sample contains 
only one unit in each stratum. In this event the formula previously 
given for estimating V (sı) cannot be used. An estimate may be 
attempted by grouping the strata in pairs. In the two strata which 
form a pair, we shall assume that the sizes N; are equal. Slight devia- 
tions from equality do not vitiate the method. The population means 
F, for the two members of a pair should not differ greatly, but the 
allocation into pairs should be made before seeing the sample results, 
for reasons which will become evident. The number of strata should 
be at least 20, to allow a minimum of 10 df in the estimated variance. 

Let the observations in @ typical pair be Yjı, Yjz, Where j goes from 
1 to L/2. Then, averaging over all samples from this pair, 


(Nj — 1) 


Ely — yn)? = (Ya — Ya) + N; (Sn? HS) 6.64) 
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where N; = Ny is the size of each stratum in the pair. Consider the 
estimate 


L/2 
Va) = =z NP Ya — Y)? (5.65) 
N jual 
By (5.64) the expected value of this quantity is 
1 L/2 
Ev(Gat) = alè NiNa — DS + E NYa — Fy) e] (5.66) 
j=1 


The first term on the right is the correct variance (by theorem 5.3 with 
na = 1): the second represents a positive bias. The size of the bias 
depends on the success attained in the formation of pairs of strata 
whose true means differ little. The form of estimate (5.65) warns us 
that we should not construct pairs by making the sample values differ 
as little as possible, since this might give a serious underestimate. The 
technique is sometimes referred to colloquially as the method of “col- 
lapsed strata.” 

An alternative method of sampling is to use each pair as a single 
stratum, with L/2 strata and two units chosen at random per stratum. 
An unbiased estimate of V (ğs:) for this kind of sampling is obtainable 
from the usual formula. The reader may verify that 


1 2N, — 2 
Vla) = al Ni(Nn — 1) aa 
u2 (Nj - ws 
Ne V F = a] (5.67) 
Ja 2N; — 1 


By comparison with (5.66) it appears that estimate (5.65) is an 
overestimate not only of the true variance with one unit per stratum, 
but also of the variance which would apply if strata twice as large 
were used. 

Whether the smaller strata are preferable, in the light of this result, 
is debatable. Unfortunately, if there is a large gain in precision from 
one unit per stratum as compared with two units per pair, there is 
also a large overestimation of the variance. 


5.22 Stratification in analytical studies. In many studies which in- 
volve sampling, we wish to compare the average values of certain 
variables in one part of the population with those in another part, in 
order to find out, for example, whether differences exist between an 
urban and a rural area, or between fee-paying and non-fee-paying 
schools, or between people of different educational levels. Such in- 
vestigations may be called analytical, because they involve the study 
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of relationships within the population rather than a mere description 
of the characteristics of the population. In analytical studies, we 
usually try to go further and speculate about the underlying causal 
factors that have led to the observed differences. In the absence of 
controlled experimentation, such speculations tend to be hazardous, 
but in many areas of research the worker has no recourse but to rely 
on a combination of observation and analysis for much of his knowledge. 

The term domains of study has been given by the U.N. Subcommis- 
sion on Sampling to those parts of the population which we wish to 
compare. 

The techniques of planning and analysis appropriate to analytical 
surveys are different from those appropriate to the descriptive sur- 
veys which have been considered thus far in this book. Suppose that 
each domain of study is a separate stratum, this being one of the sim- 
plest. ways of designing an analytical survey. If two such strata are 
being compared, it is usually of no scientific interest to ask whether 
Y, = Yo, because these means would not be exactly equal, except by 
a rare chance, even though both strata were drawn at random from 
the same infinite population. Instead, we ask whether the two strata 
can be regarded as drawn from the same infinite population. Conse- 
quently, in analytical comparisons we omit the fpe when computing 
the variances of estimated stratum means. 

The rules for allocating the sample sizes in the strata are also dif- 


ferent. If there are only two strata, the variance of the difference 
between the sample means (fı — Je) is 
Sie. So? 
==> (5.68) 
ny Ne 
For a cost function of the form 
C = a + cm + cone (5.69) 
V is minimized when 
ia ee 
Va Va ý 
a aee > - (5.70) 
Sı S2 Sı So 


—< —_ — + <e 
Va Va Va Ve 
With L strata (L > 2), the optimum allocation depends on circum- 
stances. One method is to minimize the average variance of the dif- 
ference between all pairs of strata. From (5.68) it follows that this 
aver: variance is 
average variance 1 a? oP s: 
= a 
nı ne Nn, 


L 
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Tis minimized, for fixed C, by the same rule as in (5.70), i.e. 
Sh 
Np X > 
a Sen 


Alternatively, if the c, vary little from stratum to stratum but there 
are marked variations in the S», it is sometimes preferable to allocate 
so that every mean has the same variance. For this, we choose the 
np so that 


The subject has many ramifications. The standard techniques of 
analysis of variance and of multiple regression can be applied if each 
domain is a separate stratum, or if the sample is a simple random 
sample. Some complications occur when each domain covers parts of 
several strata which have been set up. 


5.23 Exercises. 


5.1 The households in a town are to be sampled in order to estimate the 
average amount of assets per household that are readily convertible into cash. 
The households are stratified into a high-rent and a low-rent stratum. A 
house in the high-rent stratum is thought to have about 9 times as much 
assets as one in the low-rent stratum, and S+ is expected to be proportional 
to the square root of the stratum mean. 

There are 4000 households in the high-rent stratum and 20,000 in the low- 
rent stratum. (i) How would you distribute a sample of 1000 households 
between the two strata? (ii) If the object is to estimate the difference be- 
tween assets per household in the two strata, how should the sample be dis- 
tributed? 

5.2 The following data show the stratification of all the farms in a county 
by farm size, and the average acres of corn (maize) per farm in each stratum. 


Number of Average Standard 
Farm size farms corn acres deviation 
(acres) Ni Yn Sh 

0-40 394 5.4 8.3 
41-80 461 16.3 13.3 
81-120 391 24.3 15.1 
121-160 334 34.5 19.8 
161-200 169 42.1 24.5 
201-240 113 50.1 26.0 
241- 148 63.8 35.2 


Total or mean 2010 


iS] 
a 
os 
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For a sample of 100 farms, compute the sample sizes in each stratum under 
(i) proportional allocation, (ii) optimum allocation. Compare the precisions 
of these methods with that of simple random sampling. 

5.3 If the 7 strata are to be combined into 2 strata, what is the best point 
of division for proportional allocation? What is the relative precision of 2 
strata to 7 strata under proportional allocation? 

5.4 Prove the result stated in formula (5.31), section 5.7: 


s (N — n) 
7 ax V 
Vran = Vorop + NAET 


= NTa — Y) -— i EN NaS} 


5.5 If there are 2 strata and if ¢ is the ratio of the actual 21/12 to the opti- 
mum n/n for fixed sample size (as in section 5.8), show that, whatever the 
values of Ni, Nz, Si, and Se, the relative precision of the actual allocation to 
the optimum allocation is never less than 


= a 
(1+ ¢)? 


5.6 The variate y: follows the frequency distribution e~” dy: (0 < yi). 
The population is divided into 2 strata at the point yo, and a stratified random 
sample of size n is taken with proportional allocation. Find V(Ẹs) as a func- 
tion of yo. Verify that the value of yo which minimizes V satisfies the rule 
given by Dalenius (section 5.15). 

5.7 The following data are derived from a stratified sample of tire dealers 
taken in March 1945 (Deming and Simmons, 1946). The dealers were as- 
signed to strata according to the number of new tires held at a previous cen- 
sus. The sample means ğa are the mean numbers of new tires per dealer. 
(i) Estimate the gain in precision due to the stratification. (ii) Compare this 
result with the gain that would have been attained from proportional alloca- 


tion. 
Stratum ‘ i 
boundaries Na Wh Ün Sh nh 

1-9 19,850 0.8032 4.1 34.8 3,000 

êa J09 3,250 0.1315 13.0 92.2 600 
20-29 1,007 0.0407 25.0 174.2 340 

30-39 606 0.0245 38.2 320.4 230 
Totals 24,713 0.9999 4,170 


5.8 For a population with N = 6, L = 2, the values of yri are 0, 1, 3 in 
the first stratum and 5, 6, 9 in the second stratum. Compute G) V@) for a 
simple random sample with n = 2, Gi) Vga) for a stratified random sample 
with 1 unit per stratum, (iii) the average value of (ga) as estimated by the 
method of collapsed strata. Verify that iG.) > VO). 
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CHAPTER 6 x 
RATIO ESTIMATES 


6.1 Methods of estimation. One feature of the growth of theoretical 
statistics during the past 30 years is the emergence of a subst antial 
hody of theory which discusses how to make good estimates from data. 
In the development of theory specifically for sample surveys, rela- 
tively little use has been made of this body of knowledge. For this I 
think there are two principal reasons. First, with routine surveys 
which contain a large number of items, there is a great advantage in 
an estimation procedure that requires little more than simple addi- 
tion, whereas the superior methods of estimation in statistical theory, 
such as maximum likelihood, may necessitate a series of successive 
approximations before the estimate can be found. Second, there has 
been a difference in attitude in the two lines of research. Most of the 
estimation methods in theoretical statistics take it for granted that 
we know the functional form of the frequency distribution followed 
by the data in the sample, and the method of estimation is carefully 
geared to this type of distribution. The preference in sample survey 
theory has been to make only limited assumptions about this fre- 
quency distribution, e.g. that it is very skew or rather symmetrical, 
and to leave its specific functional form out of the discussion. This 
attitude is a reasonable one for handling surveys with many items, 
where the type of distribution may change from one item to another, 


and where we do not wish to stop and examine all these distributions 


before deciding how to make each estimate. 

Consequently, estimation techniques for sample survey work are at 
present rather restricted in scope. Two techniques will be considered 
—the ratio method in this chapter and the linear regression method in 
chapter 7. It is possible that the use of more complex methods will 
increase, because the gain in precision from a superior method of esti- 
mation can often be secured fairly cheaply, since only the final com- 


putations are affected. 


In the ratio method, an auxiliary variate 
d for each unit in the sample. The 
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xa correlated with y; is obtaine 
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population total X of the z; must be known. In practice, x; is often 
the value of y; at some previous time when a complete census was 
taken. The aim in this method is to obtain increased precision by 
taking advantage of the correlation between y; and £i. At present, 
we assume simple random sampling. 

The ratio estimate of Y, the population total of the y;, is 


X (6.1) 


ANE 
T 


ais 


where y, x are the sample totals of the y; and x; respectively. 

If x; is the value of y; at some previous time, the ratio method uses 
- the sample to estimate the relative change Y/X that has occurred 
since the previous time. The estimated relative change, y/z, is mul- 
tiplied by the known population total X on the previous occasion, to 
provide an estimate of the current population total.’ If the ratio 
y:/; is nearly the same on all sampling units, the values of y/x vary 
little from one sample to another, and the ratio estimate is of high 
precision. In another application, x; may be the total acreage of a 
farm and y; the number of acres sown to some crop. The ratio esti- 
mate will be successful inthis case if all farmers devote about the same 
percentage of their total acreage to this crop. 

If the quantity to be estimated is Y, the population mean value of 
yi, the ratio estimate is 

H y 
= 

zx 
Frequently we wish to estimate a ratio rather than a total or mean, 
e.g. the ratio of corn acres to wheat acres, the ratio of expenditures 
on labor to total expenditures, or the ratio of liquid assets to total 
assets. The sample estimate is Ê = y/x. In this case X need not be 
known. , 

Example. Table 6.1 shows the number of inhabitants (in 1000’s) 
in each of a simple random sample of 49 cities drawn from the popu- 
lation of 196 large cities previously discussed in section 2.8. The 
problem is to estimate the total number of inhabitants in the 196 
cities in 1930. The true 1920 total, X, is assumed to be known. Its 
value is 22,919. 

The example is a suitable one for the ratio estimate. The majority 
of the cities in the sample show an increase in size from 1920 to 1930 
of the order of 20 per cent. From the sample data we have 


y = È m = 6262: x= D 2; = 5054 


6.2 


THE RATIO ESTIMATE 


113 


TABLE 6.1 Sizes or 49 LARGE Unitep Srares cires (in 1000's) ix 1920 


Ti Yi 
76 80 
138 143 
67 67 
29 50 
381 464 
23 48 
37 63 
120 115 
61 69 
387 459 
93 104 
172 183 
78 106 
66 * 86 
60 57 
46 65 


Frequency 


Frequency 


Figure 6.1 Experimental co 
base 


i8 20 22 24 26 28 


(zi) anD 1930 (y:) 


Ti Yi Ti 
2 50 243 
507 634 87 
179 260 30 
121 113 71 
50 64 256 
44 58 43 
77 89 25 
64 63 94 
64 77 43 
56 142 298 
40 60 36 
40 64 161 
38 52 74 
136 139 45 
116 130 36 
46 53 50 
48 


Yi 
291 
105 
111 

79 
288 

61 

57 


_ denotes population total 


Total population (millions) 


d on the sample mean. 


mparison of the ratio estimate with the estimate 
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Consequently the ratio estimate of the 1930 total for all 196 cities is 


Yy 6262 
T= 22,919) = 28,397 
Pr z 5054 ¢ ) ? 


The corresponding estimate based on the sample mean per city is 


(196) (6262) = 


Y=N7= 
Y 49 


25,048 


The correct total in 1930 is 29,351. 

Figure 6.1 shows the ratio estimate and the estimate based on the 
sample mean per city for each of 200 simple random samples of size 
49 drawn from this population. A very substantial improvement in 
precision from the ratio method is apparent. 


6.3 Approximate variance and bias of the ratio estimate. The dis- 
tribution of the ratio estimate has proved annoyingly intractable, be- 
cause both y and x vary from sample to sample. The known theoreti- 
cal results fall short of what we would like to know for practical appli- 
cations. The principal results will be stated first without proof. 

The ratio estimate is consistent (this is obvious). It is biased, ex- 
cept for some special types of population, although the bias is negligible 
in large samples. The limiting distribution of the ratio estimate, as n 
becomes very large, is normal, subject to some mild restrictions on 
the type of population from which we are sampling. In samples of 
moderate size, the distribution shows a tendency to positive skewness, 
at least for the kinds of population for which the method is most often 
used. We do not possess exact formulas for the bias and the sampling 
variance of the estimate, but only approximations that are valid in 
large samples. 

‘These results amount to saying that there is no difficulty if the 
sample is large enough so that (i) the ratio is nearly normally distrib- 
uted, and (ii) the large-sample formula for its variance is valid. Defi- 
ciencies in the theory are (i) the lack of a well-substantiated rule to 
answer the question: When is the sample large enough?, and (ii) a 
serviceable method for estimating confidence limits for small samples. 
As a working rule, the large-sample results may be usea if the sample 
size exceeds 30 and is also large enough so that the coefficients of vari- 
ation of Z and # are both less than 10 per cent. This rule is rather 
poorly documented as yet. 
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Theorem 6.1 With a simple random sample of size n, the variance 
of Ppr, the ratio estimate of the population total Y, is approximately 


2 
TA zaas (v: — Rz) (6.2) 
where Æ = Y/X is the population ratio. The approximation assumes 
that n is large. 
Sketch of proof. The following discussion is not rigorous in that it 
does not justify the approximation made in the analysis. The error 
in the estimated population total is 


fe- fel e—¥ 
T 


= a (ğ — Rë) (6.3) 


since 2 = Y/X. = i 
If the sample is large, = should be close to X. The approximation 
consists in replacing the factor X/% by 1. This gives 


r — Y'=. N(ġ — RZ) (6.4) 


Apart from the factor N, the approximate error of estimate in (6.4) 
is the mean of the sample values of the variate w; = y; — Rx; We 
now apply to the variate w; theorem 2.2 for the variance of the mean 
of a simple random sample. This gives 
(u: — U)? 
. pa r 
EEA = —— WV) 


where Ọ is the population mean of the variate u; But 


U0 =Y-—RX=0 
by the definition of R. ence 


NNER n) 5 3 (y: — Rrì” 


V(fr) = iE 
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Corollary. There are various alternative forms of the result. -Since 
Y = RX, we may write 


n _NW-» = O2 
V(¥R) = nN = 1) > (u= Y) — RG: — X)} 
_ A= n) 


ain y (2 u- YP + WD we IP 


=2R E (yi — Ya- X)} 


Let us define the correlation coefficient p between y; and 2; in the finite 
population by the equation 


N 


È (i — Y)(a: — X) 


i=1 
et = Tee 
This leads to the result 
NN 
V(r) = ZOP 152 + RIS — 2RpS,Se} (6.5) 


An equivalent form is 


V(Pr) = (6.6) 


(N — n) Y? (S,? 4 Ss el 
Ne a U2” X YX 


where Syr = pS,Sz is the covariance between y; and v; This relation 
may also be written as 


N — n) Y? 
V(Pr) = eae {Cm + Crz — 2C yz} (6.7) 


where Cyy Czz are the squares of the coefficients of variation (ev) of 
y: and x; respectively, and C,, is the analogous relative covariance. 

Note 1. Asan estimate of the population ratio R, the ratio method 
uses the ratio of 7 to 7. Some readers may wonder whether the mean 
F of the ratios r; = y:/x; on the individual units would be a better es- 
timate of R. Without going into details, this does not appear to be 
the case with simple random sampling, except for some special types 
of population (cf. section’6.7). In fact, with a finite population, the 
estimate F is not consistent, since 7 taken over all sampling units does 
not equal R. Moreover, 7 is more tedious to compute than y/c. 
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Note 2. The approximate formula for the variance of the ratio 
estimate can be expressed in terms of the amount of variation in r; 
from unit to unit. From equation (6.2) we have 


_NW-% n X“ NN- ‘ 
V(Pa) aN — 1) ae E R) = oe Prt 


The sum is a weighted sum of squares of the deviations of the r; from 
the population ratio R. If all r; are equal, the approximate variance 
vanishes, as it should, since the ratio estimate is then without error. 

Nole 3. For the approximate variance of the ratio R = y/x, we 
divide the preceding formulas by X*. Two forms of the result are as 
follows: 


NN- X ; 
V(R) = WV Dx 2 (yi — Rz) 
a FED iO E Cm = Wry) (68) 
nN 


Note 4. Bias. In finding the variance of the ratio estimate of Y in 
theorem 6.1, the essential step was to introduce the approximation 


x 
Pr —¥ =" @ — RD `=. NG — Ra) 6.9) 


Since E(j — Rē) = 0, it follows that, to the order of approximation 
used in the variance formula, the ratio estimate is unbiased. In order 
to find the leading term in the bias of Yr, we must take the approxi- 
mation one stage further. This is done by writing 


retaining the first term in the Taylor series expansion. Substitution 


in (6.9) gives 


-rim - B9 (1-257) 


Now 
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Similarly, 
—— Se ee (N — n) pS,Sz 
Eq(z — X) = E@ — Y)(@— X) = N (6.10) 
n 


by theorem 2.3 (p. 17) and the definition of p. Hence the leading 
term in the bias is 
(N — n) 
E(Ŷr — Y) = — (RS? — pS,Sz) (6.11) 
nX 
The bias may be either positive or negative. With increasing sample 
size, the bias diminishes as 1/n, whereas the standard error of Ppr 
diminishes as 1/Vn. For any specific population a sample size exists 
beyond which the bias is negligible relative to the standard error. 
Now, from (6.5), 


se EF r + RS = 2RpS,Sz) 


N(N — 
(fr) = me = 


Hence the ratio of the bias to the standard error is approximately 
Bias | N-n Sz } | (RS, — pS,) 
o(Yr) N vVnX) (VS + RS? — 2RpS,S,z) 

The term inside the first bracket is the coefficient of variation of &. 


The absolute value of the term inside the second bracket is at most 
unity (this is obvious on squaring). Hence 


| Bias | 
o(¥r) 


If the sample is large enough so that the ev of z is less than 0.1, 
the bias is negligible relative to the standard error (section 1.5). 


<ev ot (6.12) 


6.4 Estimated variance. From equation (6.2), 


a > (ys — Rx)? 


Vn) = oN E 


As a sample estimate of 
È (yi — Re)? 
i=l 
N=1 
we take 3 
E wi — Bi 
i=l 


n=l 
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This estimate can be shown to have a bias of order 1/n: no method is 
available for obtaining an unbiased sample estimate. 
For the estimated variance, v(Yp), this gives 


m = NO Bs ey itn? 
n(n — 1) 
N(N 


— n) tS 5 z 

=— (Dy? +Ê Dr? — 2 Dye) (6.13) 
n(n — 1) 

this being the form which is speediest to compute. Further algebraic 
development of this expression leads to sample analogues of expres- 
sions (6.6) and (6.7). 3 

For the estimated variance of the ratio Ê, we divide (6.13) by EER 
This gives 
TE iak. 
(Ê) = ———— 
Nn(n — IX? 


Note that the sums of squares and products in (6.18) and (6.14) in- 
volve no correction for the sample mean. If X is not known, the sample 
estimate & is substituted in the denominator. 

Example. This illustrates the calculation of the standard error of a 
ratio estimate of a population total. The data given previously in 
table 6.1 will be used. First calculate 


y = Dy: = 6262: z= Da; = 5054: R= 


È y? + BD r? - 2 DY yas) (6.14) 


Formula (6.13) will be used: 


(N 


N(N — m z 
v(Êr) = waon y +Ê Da? — 2k D yti) 


To compute the term inside the bracket, the sums of squares and prod- 
ucts are placed on the same row as their multipliers, as follows: 


Multiplier 
DY y? = 1,527,882 Sk 
L z? = 1,044,504 1.535168 = R? 
D vizi = 1,251,630 2.478038 = 22 
Hence 7 
(196) (14 
(Pr) (45) (48) (29,784) = 364,854 


s(Îr) = 604 
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6.5 Sample size. An estimate of the sample size required to attain 
a specified degree of precision is made as follows. The most convenient 
starting point is equation (6.8), which gives the variance of the esti- 
mated ratio R. This equation may be rearranged as follows: 

V(Ê) (N-n 


—— [Cum + Crs — 2Cyz 5.8" 
Ep Cwt Cee 20u] (68) 


CRR = 


where Cge denotes the square of the ev of R. If we specify the de- 
sired value of this cv, and hence the value of Cgg, equation (A.8’) 
may be solved for n. The first step is to ignore the fpe, (N — »)/N, 
giving as an approximation to n 


AR Cyy + Caz — 2C yz 
gpa = 
Chh 


If at this stage the fpe is found to be necessary, the correct solution of 
(6.8’) is obtained by putting 
no 


n= 
no 


1 
ty 


With the ratio method, the ev’s of Ê, of the estimated population 
total Yp, and of the estimated population mean per unit are all equal. 
Hence the equations above apply to all three types of estimation. In 
order to use these results, we must estimate in advance the ev’s of 
yi and x; and the correlation coefficient. 


6.6 Confidence limits. If the sample size is large enough so that the 
normal approximation applies, confidence limits for Y and R may be 
obtained as follows: 


Y: YrttVo(¥r) (6.15) 
R:  R&tVo(R) (6.16) 


where ż is the normal deviate corresponding to the chosen confidence 
probability. 

In section 6.3 it was suggested that the normal approximation ap- 
plies reasonably well if the sample size is at least 30 and is large enough 
so that the ev’s of g and Z are both less than 0.1. When these condi- 
tions do not apply, the formula for (Ê) tends to give values that are 
too low and the positive skewness in the distribution of R may be- 
come noticeable. 
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An alternative method of computing confidence limits has been 
used in biological assay (Fieller, 1932; Paulson, 1942). This approach 
requires fewer assumptions than the normal approximation and takes 
some account of the skewness of the distribution of Ê. 

The method requires that 7 and ž follow a bivariate normal distri- 
bution, so that (7 — RZ) is normally distributed. It follows that the 
quantity 


—— (6.17) 


N , 
J 2 V sy + R's? — 2Rsyz 
Nn 


is approximately normally distributed with mean zero and unit stand- 
ard deviation. (We have substituted sample estimates Sy, etc., for 
the corresponding population variances and covariance, and are as- 
suming the sample size large enough so that this introduces negligible 
error, In biological assay, where samples may be quite small, the 
quantity above would be regarded as following Student’s t-distribu- 
tion. 

a value of R is unknown, but any contemplated value of R which 
makes this normal deviate large enough may be regarded as rejected 
by the sample data. Consequently, confidence limits for R are found 
by setting (6.17) equal to +é, and solving the resulting quadratic 
equation for R. The confidence limits are approximate, because if 
we try to check them by sampling repeatedly from a fixed population 
with known R, some values of 7 and & turn up for which the two roots 
of the quadratic are imaginary. Such cases become rare if the ev’s 
of g and Z are less than 0.3. 

After some manipulation, the two roots may be expressed as 


— j — PCa) UV (Coy + Caz — 203) — P(CasCez — Cie”) 
E i 
(6.18) 
where , 
Nons 
mo Na g 


is the square of the estimated cv of ğ, with analogous definitions of 
Cos and Cz. If PCy C:s, and {Cp are all smali relative to 1, the 


limits reduce to 
R = Ê + ÊV Cis + Coz — 2Ciz 


This expression is the same as the normal approximation (6.16). 
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Quadratic limits for Y are found by replacing Ê in equation (6.18) 
by Fr. 

The quadratic limits should be always at least as good approxima- 
tions as the normal limits, since they depend on fewer assumptions. 
They are slower to compute, however, and are not a complete solu- 
tion to the problem because in sampling from skew populations the 
distributions of 7 and may themselves be skew. 


6.7 Comparison of the ratio estimate with the mean per unit. The 
type of estimate of Y which was studied in previous chapters is Ny, 
where 7 is the mean per unit for the sample (in simple random sam- 
pling) or a weighted mean per unit (in stratified random sampling). 
Estimates of this kind will be called estimates based on the mean per 
unit or estimates obtained by simple expansion. 


Theorem 6.2 In large samples, with simple random sampling, the 
ratio estimate Pp has a smaller variance than the estimate 7 = Ny 
obtained by simple expansion, if 


DiS Se Coefficient of variation of x; 
a s ( =) ~ 2(Coefficient of variation of Yi) 
Proof: For Ŷ we have 
vh == 
n 


For the ratio estimate we have from (6.5) 


NIN — 17 


V(Pr) = ) 13,2 + R°S2 — 2RpS,S.} 


Hence the ratio estimate has the smaller variance if 


Si? + RS, = 2RpS,S. < 8,2 


RS | (=) /(3) 
o> as, a\R//\YP 

This theorem shows that the ratio estimate may be either more or 
less precise than 2 simple expansion. The issue depends on the size 
of the correlation coefficient between y; and z; and on the ev’s of 
these two variates. The variability of the auxiliary variate x; is an 
important factor: if its cv is more than twice that of y;, the ratic esti- 
mate is always less precise, since p cannot exceed 1. When z; is the 
value of y; at some previous time, the two cv’s may be about equal. 
In this event the ratio estimate is superior if p exceeds 0.5. 


ie. if 
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Theorem 6.2 applies only for samples large enough so that the ap- 
proximate formula for V(Yp) is valid. In smaller samples the ratio 
method probably does not compare as favorably as the theorem sug- 
gests, since the approximate formula is usually an underestimate. 


6.8 Conditions under which the ratio estimate is optimum. A well- 
known result in the theory of regression indicates the type of popula- 
tion for which the ratio estimate is the best among a wide class of 
estimates. The theorem applies only to infinite populations. 


Theorem 6.3 With simple random sampling from an infinite popu- 
lation, the ratio estimate of Y is a “best linear unbiased estimate” if 
two conditions are satisfied: 

(i) the relation between y; and 2; is a straight line through the 
origin, and 

Gi) the variance of y; about this line is proportional to ti. 

A “best linear unbiased estimate” is defined as follows. Consider 
all estimates that are linear functions of the sample values y; i.e. 
that are of the form 

liyi + loya +++ + Lan 


where the l’s do not depend on the y;, although they may be functions 
of the x; The choice of the l’s is restricted to those that give unbiased 
estimates of Y. The estimate that has the smallest variance is called 
the “best \inear unbiased estimate.” 

Proof: The mathematical model is 


yi = Br; + e; 
where the e; are independent of the z;. In arrays in which x; is fixed, 
ci has mean zeo and variance Ax; Hence 
Y = BX 
It was shown by Gauss that the best linear unbiased estimate of BX 


is bX, where b is the least squares estimate of B (see e.g. David and 
Neyman, 1938). ‘the least squares estimate is 


Ly WiYiTi 1 1 
b= here w; = =— 
wir? Do a Oss 
This gives 
p È yi T 


Consequently, the optimun estimate of Y is the ratio estimate (y/x)X. 
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The practical relevance of this result is that it suggests the condi- 
tions under which the ratio estimate is not only superior to the mean 
per unit, but is the best of a whole class of estimates. When we are 
trying to decide what kind of estimate to use, a graph in which y; 
is plotted against z; is helpful. If this graph shows that the relation 
is a straight line through the origin, and if the variance of the points 
Y: about the line seems to increase proportionally to æ; the ratio esti- 
mate will be hard to beat. 

Sometimes the relation between y; and 2; is a straight line through 
the origin, but the variance of y; in arrays in which zy is fixed is not 
proportional to x; In a population sample of Greece, Jessen ct al. 
(1947) found that the yariance increased roughly as x7. This sug- 
gests a weighted regression in which w; œ 1/z;°. For the least squares 
estimate b, this gives 

È wyt: 1 (5) 
‘ 23 wee,” n x Ti 
In this situation the best estimate of Y is bX, where b is the mean of 
the ratios y;/x; on the individual sampling units. 

Under the conditions of theorem 6.3, the ratio estimate is unbiased. 
This result does not contradict an earlier statement (section 6.3) that 
the ratio estimate is in general biased. The ratio estimate is unbiased, 
for any size of simple random sample, if the population is infinite and 
the relation between y; and 2; is a straight line through the crigin. The 
proof is left as an exercise to the reader. 


6.9 The ratio estimate in sampling for proportions. Theratio method 
plays an important role in the estimation of proportiors. With sim- 
ple random sampling, the usual formula for the variance of an esti- 
mated proportion p is 

PQ 


Ke) == 
n 


where P is the population proportion. [The facto: (N — n)/(N — 1) 
is inserted if the fpc is needed.] 

As was pointed out in section 3.2, this formuh is valid only if the 
sample is a simple random sample of units, eacl of which is classified 
into the two classes from which the proportin is derived. For in- 
stance, if the proportion of diseased plants in s wheat field is estimated 
by sampling, this formula applies if a simpl random sample of indi- 
vidual plants has been taken, each plant beig classified as diseased or 
healthy. It is unlikely that this method ¢ sampling would he used. 
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A more typical unit would be a compact area, say 1 ft long by 3 rows 
wide, all plants in each sampled area being classified. 

In such situations the sampling unit consists of a group or cluster 
of smaller units, which we may call elements. Let the ¿th unit contain 
zx; elements. Each element is assigned to one of two classes, C or C”. 
Let y; be the number of elements in the ith unit which lie in C. The 
proportion of units in C in the population is 


Z Yi Z 
g X i i 
Ya; j f &> pivrary 
= v 
i=l ( es ¢ > 
The sample estimate of this proportion is \ y% d b E 


> 


> s Calcutta 
Yi B g ee 
i Nts Ca? 
i=l Yy N ae 


p= =- : 


n 
D Ti 
i=l 


Structurally, this is a typical ratio estimate. Hence, the variance of 
p is obtained by the formula appropriate to the ratio method. 
Two equivalent forms for the approximate variance are 


N-n Ao a 
= ——— z Yap: — P 
nN(N — pea? (i — P) 


where X = X/N is the average number of elements in the cluster 
unit. For the estimated variance, we have 


= Wen 2 2 2_9 ms 6.19 
2) “ Faia = pale y? +p La? — 2p 3 vix] (6.19) 


Example. A simple random sample of 30 households was drawn 
from a census taken in 1947 in wards 6 and 7 of the Eastern Health 
District of Baltimore. The population contains about 15,000 house- 
holds. In table 6.2 the persons in each household are classified (i) 
as to whether they had consulted a doctor in the past 12 months, (ii) 


as to sex. 
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TABLE 6.2 DATA FOR A SIMPLE RANDOM SAMPLE OF 30 HOUSEHOLDS 


Doctor seen 


No. in |. 
No. of o. of in last year. 


r ~ rc 
Household persons Males Females Yes No 
No. Ti Yi Yi 


OONDPAN H 


m 
a 

E sis en eves E Sates espe oven Se SIGS SSG A “Cats Tze 
an 
Sl wr wHe wr on ner ye nw wwn awe D W H H H H o H 
a 
E| pew NH HHP RP EWN HEE NEE RE REND NNN NOR 
w 
SlrownonnoconwHroonwrpocceccccocownon 
ee] 
N| aoenmrwmwnoHP wnwnNnOnNRhRBWNONNWRHWWWNOHAS 


Totals 104 


Our purpose is to contrast the ratio formula with the inappropriate 
binomial formula. Consider first the proportion of people who had 
consulted a doctor. For the binomial formula, we would take 


n= 104: p = Ñ% = 0.2885 


Hence 
Pd (0.2885) (0.7115) 


n 104 


0.00197 


Vpin(P) 
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For the ratio formula, we note that there are 30 groups and take: 


n = 30 
x; = Total number in ith household 
yi = Number in ith household who had seen a doctor 
p = 0.2885, as before 
Z= 424 = 3.4667 
Ey? = 86; Dox? = 404; D yzi = 113 


The fpe may be ignored. Hence, from (6.19), 


(86) + (0.2885)?(404) — 2(0.2885)(113) _ 
(p) = (80) @9)(3.4667)? 0,00520 


The variance given by the ratio method, 0.00520, is much larger 
than that given by the binomial formula, 0.00197. This happens be- 
cause, for various reasons, families differ in the frequency with which 
their members consult a doctor. For the sample as a whole, the pro- 
portion who consult a doctor is only a little over 1 in 4, but there are 
several families in which every member has seen a doctor. Similar 
results would be obtained for any characteristic in which the members 
of the same family tend to act in the same way. 

Tn estimating the proportion of males in the population, the results 
are different. By the same type of calculation as above, we find: 


Binomial formula: v(p) = 0.00240 


Ratio formula: v(p) = 0.00114 


Here the binomial formula overestimates the variance. The reason is 
interesting. Most households are set up as a result of a marriage, and 
hence contain at least 1 male and 1 female. Consequently the pro- 
portion of males per family varies less from 4 than would be expected 
from the binomial formula. None of the 30 families, except one with 
only 1 member, is composed entirely of males, or entirely of females. 
If the binomial distribution were applicable, with a true P of approxi- 
mately 4, households with all members of the same sex would consti- 
tute one-quarter of the households of size 3 and one-eighth of the 
households of size 4. This property of the sex ratio has been dis- 


cussed by Hansen and Hurwitz (1942). 


6.10 The approach to normality of the distribution of the ratio. The 
result that the limiting distribution of the ratio g/Z in large samples is 
normal comes from standard theorems in probability. We shall quote, 


as a lemma, a result of wide generality given by Cramér (1946). This 
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result, which assumes an infinite population, provides the limiting 
distribution of any known function of any two central moments m, 
mg, say, calculated from a sample of pairs of values y; z; By a cen- 
tral moment of the sample, we mean a quantity of the form 


12 
m=- 3) (y: — "(ai — 3)” 
n i=l 


where wu and w are positive integers. The corresponding moment for 
the whole population is 


M = Ey; — g)" (2; — 2)” 


averaged over all units in the population. 

Lemma (Cramér). If in some neighborhood of the point mı = My, 
mz = Mo, the function H(m,, mg) is continuous and has continuous 
derivatives of the first and second orders with respect to m and mo, 
then the limiting distribution of the random variable H(m, me), as 
n, becomes large, is normal with 


Mean = H(M,, Mə) 
, Mad ðH N / dH 
Variance = o,° |—) + 2 cov (m, m2) | — ) | — 
ðm, dm,/ \ôMa 
» (9H\? 
Eom 5) 
dma 


where the partial derivatives are computed at the point mı = M,, 
me = Mo. 


Theorem 6.4 If the population variances S,7, S,” are finite, and if 


the population mean X = 0, the limiting distribution of 7/%, in ran- 
dom samples of size n from an infinite population, is normal with 


Y 
Mean === 
X 


y2 YR E 


Proof: In order to apply the lemma we take 


1 eg 2pS8,Sz =| 


Variance = — R? 
n 


m =f; m =F; H(m, m) = — 


Consequently Mı = Y, Mz = X. 
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The function H is continuous and has continuous partial derivatives 
of the first and second orders in some neighborhood of the point 7 = 
Y, @ = X, provided that X ~ 0. Further, at this point, 

oH 1 1 oH Mı F 

öm M, X tm Me P 
Hence, from the lemma, the limiting distribution of 7/2 is normal, 
with 


Mean = Y 
X 
Wan 1 EB aa Y fa =| 
aria eS "i — 
lance ac pSy me 7 


n y? YX x? 

The result is the same as that stated previously in equation (6.8), 
apart from the fpe. So far as practical applications are concerned, 
the generality of the result is pleasing, the only restrictions being that 
yi and z; have finite variances and that X <0. Theorems given by 
Madow (1948) enable the result to be extended to finite populations, 
subject to further mild restrictions on the nature of the population. 
This extension will not be discussed here. 


1 pe ie 2pS,Sx =| 


6.11 Ratio estimates in stratified random sampling. There are sev- 
eral ways in which a ratio estimate of the population total Y can be 
made. One is to make a separate ratio estimate of the total. of each 
stratum and add these totals. If yn 2, are the sample totals in the 
hth stratum and X; is the stratum total of the tri, this estimate Prs 
(s for separate) is 2 
Ping mm Fo Rei E (6.20) 
h Th i h 
No assumption is made that the true ratio remains constant from stra- 
tum to stratum: the estimate requires, however, a knowledge of the 
separate totals Xn. 


Theorem 6.6 If the sample sizes n+ are large in all strata, 


Na(Na — na) 
Vile) = 2 
h Nh 
where R, = Y;/Xn is the true ratio in stratum h, and pn is defined as 
before in each stratum. 


(Syn? + Ri2Sen? — 2RnprSywSzn] (6.21) 
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Proof: Write 


o Yh 
"Ra = — Xp 
Th 
Then 
(Prs — Y) = D (Pen — Ya) 
h 
Hence 


V(r) = E(V rs — Y)? 
= D ElPe — Yn)? +2 OD Er — Yi) — Y;) 
h Aj 


Since Ppa is the ratio estimate made from a simple random sample 
within stratum h, we may use formula (6.5) for the approximate vari- 
ance of Ppp, i.e. 


Ni(Ni — M 


) 
V(Yrn) = (Syn? + RS — 2Rip~SyrSzal 


Nh 
The cross-product terms vanish, because the sampling is independent 
in the different strata and, to the order of approximation used in the 
variance formula, Ppr is an unbiased estimate of Y}. Result (6.21) 
follows. 

This formula is valid only if the sample in each stratum is large 
enough so that the approximate variance formula applies to each 
stratum. This limitation should be noted in practical applications. 
We do not possess a trustworthy variance formula for Prs when the 
ny are small. 

Moreover, when the n, are small and L is large, the bias in Fp, 
may not be negligible relative to its standard error, as the following 
crude argument suggests. 

Tn a single stratum we have seen (section 6.3) that 


| Bias in Pr | 
o(Prn) 


If the bias has the same sign in all strata, as may happen, the bias in 
Yrs will be roughly L times that in Tru. But the standard error of- 
Pps is only of the order of V1. times that of Pra. Hence the ratio 


| Bias in Sp, | 


o(Prs) 
VL (ev of En) 


< ev of õn 


might be as large as 
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For examgle, with 50 strata and the ev of %, about 0.1 in each 
stratum, the bias in fp, might be as large as 0.7 times its standard 
error. 

In the present state of our knowledge, Yrs is to be avoided unless 
VL (cv of Za) appears to be less than 0.2. This rule is probably too 
conservative, because in practice the bias may be much smaller than 
its upper bound, particularly if within each stratum the relation be- 
tween yp; and qas is approximately a straight line through the origin. 


6.12 The combined ratio estimate. An alternative estimate is de- 
rived from a single combined ratio (Hansen, Hurwitz, and Gurney, 
1946). From the sample data we compute 


Pau = E Nin Se = Nat 
h h 


These are the standard estimates of the population totals Y and X, 
respectively, made from a stratified sample. The combined ratio es- 
timate, Pre (c for combined) is 

Fa 


Üst 
fre = —— X =—X 
wE E Gi 


from a stratified sample. 
The estimate Pre does not require a knowledge of the Xa, but only 
of X. 


where Fae = Yar/N, et = X,,/N are the estimated population means 


Theorem 6.6 If the total sample size n is large, 


Ni (Nan — 
vr) = TNA =) igp? + RSA? — RSSa] (6.22) 
h Nh 


Proof: This follows the same argument as theorem 6.1. In the 
present case the key equation (6.4) appears as 


(Pre — Y) `=. N (Gar — RFs) (6.23) 


Now consider the variate Uri = Yhi — Razni- The right side of equa- 
tion (6.23) is Nia, where Use is the weighted mean of the variate uj; 
in a stratified sample. Further, the population mean TU of un: is zero, 


since R = Y/X. 
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Hence we may apply to a, theorem 5.3 for the variance of the esti- 
mated mean from a stratified random sample. This gives 


VPR) = NV) = X Ni(Na — na) 
h Nh 


2 
Suh 


where 


TETERE E 
Su = ——— Uni — 
(Ni — 1) ea i ý 


1 2 = = 
= mone {luni — Yn) — Rian — Xn)}? 
When the quadratic is expanded, result (6.22) is obtained. 

From equations (6.21) and (6.22) it is interesting to note that the 
approximate variances of Îr, and Ppr. assume the same general form, 
the only difference being that the population ratios Ry in the individual 
strata in (6.21) are all replaced by R in (6.22). 

Comparison of the two estimates. We may write 


V(Pre) — V(Prs) 


Ni(Na — Mn) 
=z mMc [(R? — RPS? — 2(R — Ra) oS yS] 
h h 
Ni(Na — n) 


= (R — Ri)? San + (Ra — R) (SuSe — RaSzn”)] 
h 


In situations in which the ratio estimate is appropriate, the last 
term on the right is usually small. (It vanishes if within each stratum 
the relation between yni and 2; is a straight line through the origin.) 
Thus, unless R} is constant from stratum to stratum, the use of a 
separate ratio estimate in each stratum is likely to be more precise. 
This discussion assumes, however, that the sample in each stratum is 
large enough so that the approximate formula for V(Yrs) is valid. 
With only a small sample in each stratum, the combined estimate is 
to be recommended unless there is good empirical evidence to the 
contrary. 

For sample estimates of these variances we substitute sample esti- 
mates of R, and R in the appropriate places. The sample mean squares 
Sy? and san” are substituted for the corresponding variances, and the 
sample covariance for the term pSyhShz- The sample mean square 
and covariance must be calculated separately for each stratum. 
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Example. The data come from a census of all farms in Jefferson 
County, Iowa. In this example ya: represents acres in corn, and Thi 
acres in the farm. The population is divided into two strata, the first 
stratum containing farms of size up to 160 acres. We assume a sam- 
ple of 100 farms. When stratified sampling is used, we shall suppose 
that 70 farms are taken from stratum 1 and 30 from stratum 2, this 
being roughly the optimum allocation. The data are given in table 6.3. 


TABLE 6.3 DATA FROM JEFFERSON County, Iowa 


Size 


S Sye S; Sah” 
Strata (farm acres) Nn Syh 'yzh San Rr 
| 0-160 1580 312 494 2055 0.2350 
2 Over 160 430 922 858 7357 0.2109 
For complete pop. 2010 620 1453 7619 0.2242 
a 
Strata Yr Xn my | Qu = WaP/ra| Vil | Va" 
1 19.40 82.56 70 0.008828 193 194 
2 51.63 244.85 30 0.001525 887 907 


For complete pop. 26.30 117.28 100 


The last. three quantities, Qa, Vx, and V,”, are auxiliary quantities 
to be used in the computations, the last two being defined later. 
We consider five methods of estimating the population mean corn 
acres per farm. The fpe will be ignored. 
i. Simple random sample: mean per farm estimate. 
v S, 620 k 
oa w) : 


ii, Simple random sample: ratio estimate. 


V2 


1 [9,2 + R°S.? — 2RSa] 
n 


sr [620 + (0.2242)?(7619) — 20.2242) (1453)] 


= 3.51 
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iii. Stratified random sample: mean per farm estimate. 
V; = — — S = Pas QnS yr? = 4.16 
™ 


iv. Stratified random sample: ratio estimate using a separate ratio 
in each stratum. 


Va = QS yp? + RS? — 2RvSye] = D Qa Vw = 3.06 


v. Stratified random sampling: ratio estimate using a combined 
ratio. 


Vs = D QilSy? + R82? — 2RSyzn] = D QV,” = 3.10 


The relative precisions of the various methods can be summarized 
as follows: 


Method of Relative 

Sampling method estimation precision 
(i) Simple random Mean per farm 100 
(i) Simple random Ratio 177 
(iii) Stratified random Mean per farm 149 
(iv) Stratified random Separate ratio 203 
(v) Stratified random Combined ratio 200 


The results bring out an interesting point of wide application. 
Stratification by size of farm accomplishes the same general purpose as 
a ratio estimate in which the denominator is farm size. Both devices 
diminish the effect of variations in farm size on the sampling error of 
the estimated mean corn acres per farm. For instance, the gain in 
precision from a ratio estimate is 77 per cent when simple random 
sampling is used, but is only 36 per cent (203 against 149) when strati- 
fied sampling is used. 

In the design of samples there may be a choice whether to introduce 
some factor into the stratification, or to utilize it in the method of 
estimation. The best decision depends on the circumstances. Rele- 
vant points are: (i) some factors, e.g. geographical location, are more 
easily introduced into the stratification than into the method of esti- 
mation; (ii) the issue depends on the nature of the relation between 
y; and z;. All simple methods of estimation work most effectively 
with a linear relation. With a complex or discontinuous relation, 
stratification may be more effective, since, if there are enough strata, 
stratification will eliminate the effects of almost any kind of relation 


between y; and 2;. 
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6.13 Optimum allocation with a ratio estimate. The optimum allo- 
cation of the n, may be different when a ratio estimate is used from 
that when a mean per unit is used. Consider first the variate Pre. 
From theorem 6.5 its variance is 


Ni(Nin — mn) 
Vr) = > —— [Sm + BaS? — 2RaprSynSer] 
h h 
Ni(N, — 1 Nn 
= b Naa — m) 2 with Sa = —— D d? 
h Nh (Wi. — 1) iar 
(6.24) 


where daz = yni — Ratni is the deviation of yni from Ratai. By the 
methods given in chapter 5 for finding optimum allocation, it follows 
that (6.24) is minimized subject to a total cost of the form Dd cinn 
when 

NiSan 


Na X 
Ven 


With a mean per unit it will be recalled that for minimum variance 
nm, is chosen proportional to NiSyr/ Ven 
In the planning of a sample, the allocation with a ratio estimate 
may appear a little perplexing, because it seems difficult to speculate 
about the likely values of San- Two rules are helpful. With a popula- 
tion in which the ratio estimate is a best linear unbiased estimate, San 
will be roughly proportional to VE, (by theorem 6.3). In this case 
the nn should be proportional to N, Vx A Ven. Sometimes the vari- 
ance of da: may be more nearly proportional to X,?. This leads to 
the allocation of n, proportional to NaXa/ Ven, i.e. to the stratum 
total of za; divided by the square root of the cost per unit. An exam- 
ple of this type is discussed by Hansen, Hurwitz, and Gurney (1946), 
for a sample designed to estimate sales of retail stores. 
Tf the estimate Yr; is to be used, the same general argument applies. 
Example. The different methods of allocation can be compared 
from data collected in a complete enumeration of 257 commercial 
peach orchards in North Carolina in June 1946 (Finker, 1950). The 
purpose of this survey was to determine the most efficient sampling 
procedure for estimating commercial peach production in this area. 
d on the number of peach trees and the esti- 


Information was obtaine ; s 
mated total peach production in each orchard. The high correlation 


136 RATIO ESTIMATES 6.13 


between these two variables suggested the use of a ratio estimate. 
One very large orchard was omitted. 

For this illustration, the area is divided geographically into three 
strata. The number of peach trees in an orchard is denoted by Tri, 
and the estimated production in bushels of peaches by Yri. Only the 
first ratio estimate Pr, (based on a separate ratio in each stratum) 
will be considered, since the principle is the same for both types of 
stratified ratio estimate. 

Four different methods of allocation will be compared: (i) na pro- 
portional to Nj, (ii) n» proportional to NaSyn, (iii) na proportional to 
NuV Xi, and (iv) na proportional to N,X, = X,. A sample size of 
100 will be considered. The data needed for these comparisons are 
summarized in table 6.4. 


TABLE 6.4 Data FROM THE NORTH CAROLINA PEACH SURVEY 


Strata | Saa | Syn | Sm | San Syr biet Yn Ry Sar? 
1 5,186 | 6,462 | 8,699 | 72.01 | 93.27 | 53.80 | 69.48 1.29133 658 
2 2,367 | 3,100 | 4,614 | 48.65 | 67.93 | 31.07 | 43.64 1.40475 573 
3 4,877 | 4,817 | 7,311 | 69.83 | 85.51 | 56.97 | 66.39 1.16547 | 2,706 


Pop, | 3,898 | 4,434 | 6,409 | 62.43 | 80.06 | 44.45 | 56.47 1.27053 | 1,433 


Strata | Na | © | Nosa | | VX, | mv | aD | M av) 


1 47| 18| 4,384| 22 7.33 344.5 20 2,529 | 22 
2 118 | 46 | 8,016 | 40 5.57 657.3 39 3,666 | 32 
3 91 | 36 | 7,781 | 38 7.55 687.1 41 5,184 | 46 


Pop. | 256 | 100 | 20,181 | 100 | 20.45 1,688.9 100 | 11,379 | 100 


The upper part of the table shows the basic data. The method 
employed to calculate the four variances was first to find the mn for 
each type of allocation. These values are shown in the columns headed 
(i)-(iv) in the lower part of the table, Thus, with allocation i, n} = 
nN>/N, so that in the first stratum 


a (100)(47) _ 
a S 
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When the n, have been obtained, the corresponding V(Yp,) is 
found by substituting in the formula 


KE.) = EE as 


h Nh 
where 
San? = Sy? + RES — 2RSyzr 


The quantities Sz,” are the same for all four allocations and are given 
on the extreme right of the top half of table 6.4. 
The variances and relative precisions are shown in table 6.5. 


TABLE 6.5 COMPARISON OF FOUR METHODS OF ALLOCATION 


Variance 
Method of 
allocation: nh Strata Relative 
proportional precision 
to Total 
1 2 3 

(i) Na 49,824 105,833 376,215 | 531,872 100 
(ii) NaSyr 35,144 131,847 343,446 | 510,437 104 
(iii) Na V Xn 41,750 136,964 300,312 | 479,026 111 
(iv) NaXn 35,144 181,710 240,888 | 457,742 116 


There is not- much to choose among the different allocations, as 
would be expected since the n, do not differ greatly in the four methods. 
Method iv, in which allocation is proportional to the total number of 
peach trees in the stratum, appears a trifle superior to the others. 


6.14 Exercises. 


6.1 ,In a field of barley the grain, y;, and the grain plus straw, 2x;, were 
weighed for each of a large number of sampling units located at random over 
the field. The total produce (grain plus straw) of the whole field was also 
weighed. The following data were obtained: 


Cy = 1.18, Cye= 0.78, Cre = 1.11 


Compute the gain in precision obtained by estimating the grain yield of the 
field from the ratio of grain to total produce instead of from the mean yield 


of grain per unit. 
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Tt requires 20 min to cut, thresh, and weigh the grain on each unit, 2 min 
to weigh the straw on each unit, and 2 hr to collect and weigh the total produce 
of the field. How many units must be taken per field in order that the ratio 
estimate be more economical than the mean per unit? 

6.2 For the data in table 6.1, Pr = 28,367 and 


Css = 0.0142668, Czz = 0.0146541, Cz: = 0.0156830 


Compute the 95 per cent quadratic confidence limits for Y and compare them 
with the limits found by the normal approximation. 

6.3 For the sample of 30 households in table 6.2, the following data refer 
to visits to the dentist in the last year: 


Dentist seen Dentist seen 
No. of a No. of —_ 
persons Yes No persons 


oa 
o 
a 
Z 
[a] 


Wwa N O o a a o o a N a oa 
CHNORORHRHOCONHOH 
WOWANNPRWNNWNHN|AE 
PNR WRN EK WOR WORE 
COPRHCOOR OR OHHE RH 
RNY WN ERNE NWWONWOR 


Estimate the variance of the proportion of persons who saw a dentist, and 
compare this with the binomial estimate of the variance. 

6.4 The following data are for a small artificial population with M = 8 
and two strata of equal size: 


Stratum 1 Stratum 2 
Tii Yii Tzi G Yzi 
2 0 10 7 
5 3 18 15 
9 7 21 10 
15 10 25 16 


For a stratified random sample with nj = nə = 2, compare the variances of 
Ppr. and Yr, by working out the results for all possible samples. To what, 
extent is the difference in variances due to biases in the estimates? 


— 
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CHAPTER 7 
REGRESSION ESTIMATES 


7.1 The linear regression estimate. Like the ratio estimate, the linear 
regression estimate is designed to increase precision by the use of an 
auxiliary variate x; which is correlated with y; When the relation 
between y; and x; is examined, it may be found that, although the re- 
lation is approximately linear, the line does not go through the ori- 
gin. This suggests an estimate based on the linear regression of y; 
on 2; rather than on the ratio of the two variables, 

We suppose that y; and x; are each obtained for every unit in the 
sample, and that the population mean X of the x; is known. The 
sample regression of y; on x; is computed. For the present we assume 
that the least squares regression coefficient b is used, where 


D (yi — D(x: — 2) 
b = i=1 


> (xi — @)? 
i=l 


The linear regression estimate of Y, the population mean of the y;, 
is 


Je = 7 +(X ~ 2) (7.1) 


where the subscript Ir denotes linear regression: The rationale of this 
estimate is that, if Z is below average, we should expect 7 also to be 
below average by an amount b(X — #), because of the regression of 
yi on x; For an estimate of the population total Y, 
Nir. 

Watson (1937) used a regression of leaf area on leaf weight to esti- 
mate the average area of the leaves on a plant. The procedure was 
to weigh all the leaves on the plant. Fora small sample of leaves, the 
area and the weight of each leaf were determined. The sample mean 
leaf area was then adjusted by means of the regression on leaf weight. 
The point of the application is, of course, that the weight of a leaf 


can be found quickly, but determination of its area is more time- 
consuming. 


we take F, = 


140 
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This example illustrates a general situation in which regression esti- 
mates are potentially helpful. Suppose that we can make a rapid 
estimate z; of some characteristic for every unit, and can also, by some 
more costly method, determine the correct value y; of the character- 
istic for a simple random sample of the units. For instance, a rat 
expert might make a quick eye estimate of the number of rats in each 
block in a city area, and then determine, by trapping, the actual num- 
ber of rats in each of a simple random sample of the blocks. In another 
application described by Yates (1949), an eye estimate of the volume 
of timber was made on each of a number of yg-acre plots, and the 
actual timber volume was measured for a sample of the plots. The 
regression estimate 

+X — 2) 


adjusts the sample mean of the actual measurements by the regres- 
sion of the actual measurements on the rapid estimates. The rapid 
estimates need not be free from bias. If x; — y; = D, so that the 
rapid estimate is perfect except for a constant bias D, it may be veri- 
fied that b = 1 and the regression estimate becomes 


G+ (X¥-a) =X+G-H 
= (Pop. mean of rapid estimate) + (Adjustment for bias) 


Our knowledge of the properties of the regression estimate is of 
about the same scope as our knowledge for the ratio estimate. The 
regression estimate is consistent, although this is in the trivial sense 
that, when the sample comprises the whole population, = = X, and 
the regression estimate makes no adjustment. As will be shown, the 
estimate is in general biased, but the ratio of the bias to the standard 
error becomes small when the sample is large. We possess a large- 
sample formula for the variance of the estimate, but more information 
is needed about the distribution of the estimate in small samples and 
about the value of n required for the practical use of large-sample 
results. 


7.2 Large-sample theory. The theory of linear regression plays a 
prominent part in most courses in elementary statistics. The stand- 
ard results of this theory are not entirely suitable for sample surveys, 
because they require the restrictive assumptions that the population 
regression of y; on 2; is linear, and that the residual variance of y; 
about the regression line is constant. If these two assumptions are 
violently wrong, a linear regression estimate will probably not be 
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precise, and an estimate based on a curvilinear regression or a weighted 
linear regression is preferable. There are situations, however, in 
which we doubt whether the gain in precision from these more elabo- 
rate methods would be worth the labor, and there are others in which, 
although we have reason to believe that the regression is linear, we 
do not have good evidence that it actually is. 

Consequently we first present a theory which does not assume that 
the regression is linear in the population, and which gives results that 
hold only in large samples. This theory is analogous to the large- 
sample theory for the ratio estimate. 


The finite population linear regression coefficient, B, is defined by 
the relation 


N: =, 
D (yi — Y)(a; — X) 


i=l 


B== (7.2) 
È (c: — X? 
i=l 
The residual variate, e;, is defined by the relation 
Yi = Y + B(z; — X) +e; (7.3) 
Adding (7.3) over all units in the population, we find 
N 
Dez =0 (7.4) 


i=l 


Note that no linear relation between Yi and x; is assumed. The 
population consists of a set of N pairs of values (Yi, £i), and the appar- 
ent linear regression in equation (7.3) has been constructed by our 
definitions of B and the ei. 


Theorem 7i For a simple random sample, with n large enough so 
that sampling errors in the sample regression coefficient b can be ig- 
nored, 


(7.5) 


. n 
where 
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Proof: By its definition (7.1) 
Jr = 7 + (X — 3) (7.1) 
But from (7.3), averaged over the units in the sample, 
g=Y+BE-X)+e 
Substitute into (7.1). This gives 
Jr = Y + (b — BX — 3) +ë (7.6) 


Hence, if the sampling error (b — B) can be ignored, 


Tır — Y =ë 
Thus = 
Vir) = EG — Y? = EQ’) 

But, since Æ(ē) is zero by equation (7.4), E(@) is the variance of 
the mean of the quantities e; in a simple random sample. Hence, by 
theorem 2.3, 
(N — n) Sè 

N n 


V@ir) = 


Corollary. If the correlation coefficient p between y: and 2; in the 
finite population is defined by the relation 


bs > (yi — Yl: — X) 
e= Eu Le a 
where the sums extend over all units in the population, then 
(N — n) 
Nn 
Proof: From equation (7.3), summing over the population, 
Zeef Y) -Ba -IF 
= D (v: — Y)? — 2B D (v: — Y)(r: — X) + B? X (e: - X)? 
=a = Y) — B? D (a; — X)? 
by the definition of B, equation (7.2); 
= E u- YPU- 6) 
from the definition of p, equation (7.7). 
Hence 


V@r) = SPA — e’) (7.8) 


S2 AE) 
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Theorem 7.2 For a simple random sample, with n large enough so 
that sampling errors in the sample regression coefficient are negligible, 
an unbiased estimate of V(j:,) from the sample is 


N-n) 2 3 Ag 
nS Yi — 9) — blr: — F 7.9) 
vir) Mma- jS tu: — 7) — 0 )} ( 
Proof: From theorem 2.4, an unbiased estimate of S2 from the 


sample is 3 


È (e — 2)? 


(n= 1) 


a2 
Now, from equation (7.3), 
ei — 8 = y: — 9 — Be; — 3) 
= {vi — 9) —b@:-D}+O- B)(x: — 2) 
If sampling errors in b are negligible, the last term vanishes, and 
D (e-a)? = D {ys — 9) bla- a}? 
tal i=l 
Hence, for v(ğr) defined as in equation (7.9), 
(WV — n) 
Eor) = ——— 82 = V (gn 
(Gir) Wn e (Gir) 
by equation (7.5). 
Theorems 7.1 and 7.2 do not specify how the sample regression co- 


efficient b is to be computed. If b is the least squares regression co- 


efficient, the sum of squares in v(i) is most quickly calculated in the 
form 


(Xu: — De: — DP 
jo Pf)? a SEE NNE 
Du-p- Zute 


If b is not the least squares regression coefficient, the preceding for- 
mula does not hold. The sum of squares can be computed as 


È (v: — 9) — 2% È (vi — I)e: — 2) + 2 DX (a; — 2)? 


this the population is assumed infinite, 
and z; is 


ys = F + Ba; — X) + ej (7.3') 
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Formally, this is the same as equation (7.3), except that an extra sub- 
script j has been added as a reminder that in standard regression theory 
there is a whole frequency distribution or array of values of Yij and 
cij for each value of x;. The theory assumes simple random sampling, 
and further that, in every array in avhich 2; is fixed, 


E(e;3) = 0: E(e;7) = 8.2 = Constant 

From this model, by the same analysis as in theorem 7.1, we obtain 

Jr — Y = ē + (b — B)(X — 3) (7.10) 
Now, if b is the least squares regression coefficient, 
p = S-Di è) 

È (a; — @)? 

where the extra subscript j has been dropped. Substitution for y; 
and g from equation (7.3’) gives 


(7.11) 


Hence 
(7.12) 


In repeated samples in which the x; remain fixed from sample to sam- 
ple, it is easy to verify that the covariance of @ and (b — B) is zero, 
and that 

S? 


È (@: — 4)? 


V(b) = E(b — B} = 
Hence, from (7.10), 


ot ORE a f 
Vr) = Dr- VY)? = —- + (X -are 


= 821-4 (7.13) 


n Rele- 2)? 

This is a standard result in elementary regression theory. It is exact 
for any size of sample, subject to the assumptions stated previously.* 
Under these assumptions, ğı is an unbiased estimate of Y. This 
can be shown by considering repeated samples in which the x; remain 


l ar 


* Since the variance of fı applies to repeated samples in which the values of 
the z’s remain fixed, this result is another instance of the use of conditional dis- 
tributions, which helps to simplify the mathematical analysis. 
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fixed from sample to sample. Since, by hypothesis, Z(e;;) = 0 in such 
samples, it follows that H(@) = 0 and from equation (7.12) that 
E(b — B) = 0. Referring now to equation (7.10), we see that i, is 
unbiased in repeated samples in which the x; remain fixed, and hence 
is unbiased in repeated simple random sampling. 

These results lead to an alternative form of theorem 7.1. 


Theorem 7.1a Under the assumptions stated at the beginning of 
this section, ğı is an unbiased estimate of F, with variance as given 
by equation (7.13). 

The term involving the x; in equation (7.13) is the contribution 
from the sampling error of b. The average value of this term in sim- 
ple random samples of size n depends on the shape of the frequency 
distribution of the z; If this distribution is normal, it may be shown 


that E [ @ — X) 1 
De-al n(n — 3) 


When the x; are not normally distributed, the average of the term in 
the x; may be expanded in a series of inverse powers of n by Fisher’s 
method of cumulants (1928). The leading term in the series is found 
by replacing J, (x; — 2)? by (n — 1)S,? as an approximation. This 


Ero ae STRE 1 
(n — 1)S,? n(n—1)S2 n(n — lL we 


to this order of approximation. 
Hence, to terms of order 1/n?, 


12 
E(V@,)} = 2 ( + n) (7.14) 


n n. 


This result indicates that if n exceeds 50, the contribution of sampling 
errors in the least squares b is negligible. 


This result is subject to the assumption that the population regres- 
„sion is linear. 


When this regression is non-linear, the contribution 
of the sampling error of b to Y (Jı) can be expanded 
verse powers of n. The leading term is of 
(7.14), but the numerator is a function 
joint distribution of e; and 2; (Cochran, 1942). 


: To complete the elementary theory, another standard result, which 
is valid for any size of sample is that 


in a series of in- 
order 1/n?, as in equation 
of certain moments of the 


j 1 


E x (U: = 9) — blr; — a) |? (7.15) 
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is an unbiased estimate of S,*._ This differs from the large-sample re- 
sult given in theorem 7.2 only in that the divisor is (n — 2) here, as 
against (n — 1) in theorem 7.2; and that no fpe is included. 


7.4 Bias of the regression estimate. If the relation between y; and 
x; is non-linear, ğı is subject to a bias of order 1/n. We shall resume 
the finite population model of section 7.2. From equation (7.6), the 
error of estimate is 
gn — F =e+ @- BX- 3) 
If b is the least squares estimate of B, then by equation (7.12) 
D elz: — 2) 


j > ay 


E O P e) 
E 


In repeated simple random samples Æ(ē) = 0, since the population 
mean of the e; is zero. The average value of the second term on the 
right can be expanded in a series of inverse powers of n. The leading 
term is obtained by the following non-rigorous argument. 

We may replace the denominator, >> (x; — #)*, by the approxima- 
tion (n — 1)S,”.. The numerator may be written 


—(@ — X){ > ez; — X) — Do el — X)} (7.16) 


Let u; be the variate e;(z; — X). Then 


b-B 


Hence 


N N 
Du = D elt- X) 
i=l i=l 


N 
= © {(v: — Y) — Blz: — X)} (c: — X) = 0 
i=l 
by the definition of B for the finite population. Hence the population 
mean U = 0. Consequently the average value of the first term in 
(7.16) may be written 


” Pic ee) SN SO i E 
—nE(@ — X)(ū — U) = mN & (x; — X) (u — U) 
by theorem 2.3 (p. 17); 
N-n) X 5 


oe S SP 
wone 
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The average of the second term in (7.16) turns out to be of order 1/n 
and will not be considered. i hs 

Hence, dividing by (n — 1)S,’, the leading term in the bias of r 
is, to terms of order 1/n, 


—(N — n) pee = =| 
m-sl W-D 


The expression inside the brackets is the population covariance be- 
tween e; and («; — X)*; it represents a contribution from the quad- 
ratic regression of y; on t; and vånishes if the relation between y; 
and 2; is linear. Since the bias of Jr is of order 1/n, while its standard 
terror is of order 1/+/n, the bias becomes negligible in large samples. 


(7.17) 


7.5 Comparison with the ratio estimate and the mean per unit. For 
these comparisons the sample size n must be sufficiently large so that: 
the approximate formulas for the variances of the ratio and regres- 
sion estimates are valid. The three comparable variances for the es- 
timated population mean Y are as follows: 

N-n) 


V@ir) = Tn nat =p) (Regression) 


Nh 
V@r) = a» (S,? + R?S? — 2RpS,S:) (Ratio) 
n 
7 (N — n) ' 
VQ) = Wn BA (Mean per unit) 


Tt is apparent that the variance of the regression estimate is smaller 


than that of the mean per unit unless p = 0, in which case the two 
variances are equal. 


The variance of the regression estimate is less than that of the ratio 
estimate if 
PS < R282 — 2RpS,S, 
This is equivalent to the inequality 
(PS, — RS)? > 0 


Therefore the regression estimate 
mate unless 


(7.18) 


is more precise than the ratio esti- 


tes RS, Coefficient of variation of z; 


S Coefficient of variati ; (7.19) 
since R = P/Y. v nt of variation of y; 
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Equation (7.19) holds whenever the relation between y; and 2; is a 
straight line through the origin, and in this event the regression and 
ratio estimates are equally precise. It is interesting that the regres- 
sion estimate is as precise as the ratio estimate, even when the latter 
is a best unbiased estimate. 

Actually, the ratio estimate is a particular case of the linear regres- 
sion estimate. If we take the regression coefficient b” = 7/2, a value 
that might be considered appropriate if the line was thought to pass 
through the origin, we have 


V a 
fragt =a) 


Vs = 
X=ğr 


ale 


The regression estimate is more laborious to compute than the ratio 
estimate, principally owing to the labor of computing b. With a large 
sample, an inefficient estimate b’ can be used if this produces a sav- 
ing in time. In section 7.3 it was pointed out that the contribution to 
V (Gz) from the sampling error of the least squares b amounts to about 
1/nth of the principal component of the variance. Consequently, an 
estimate b’ which effectively uses half the data, i.e. is of 50 per cent 
efficiency, increases V(ijz,) from 


1 
S21 — 6%) (i + -) 
n 
to 


2 
SFA — P) (: + 5) 
n 


If n is large this increase is trivial. Thus we may estimate B and 
Sy.z from a subsample of the data. If there is good evidence that the 
true regression is straight, the subsample may consist of, say, every 
fifth or every ninth unit in the sample. If there is doubt whether the 
regression is straight, the subsample should be a random one, or es- 
sentially equivalent to this. 

Sometimes it may be possible to guess a value of b from previous 
experience. For any constant value of b, say b*, which does not de- 
pend on the results of the sample, fır is an unbiased estimate, since in 
repeated simple random samples 


En, = Y +o*H(X - 3) =Y 
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Example. The precision of the regression, ratio, and mean per unit 
estimates from a simple random sample can be compared using data 
collected in the complete enumeration of peach orchards described on 
p. 135. In this example, y; is the estimated peach production in an 
orchard and x; the number of peach trees in the orchard. We will 
compare the estimates of the total production of the 256 orchards, 
as made from a sample of 100 orchards. It is doubtful whether the 
sample is large enough to make the variance formulas fully valid, 
since the cv’s of 7 and Z are both somewhat higher than 10 per cent, 


but the example will serve to illustrate the computations. The basic 
data are as follows: 


S,? = 6409: Syz = 4434: S,2 = 3898 


R = 1.270: p = 0.887: n=100: N = 256 
N(N — n) 
Vf) = ea ni 1) 
Ld (256) (156) 
100 


NIN - 
V(r) = “9 (S,? + R°S,? — 2RS,.) 


(6409)(1 — 0.787) = 545,000 


(256) (156) 
= ana [6409 + (1.613)(3898) — 2(1.270)(4434)] 
= 573,000 


vi?) _ NW 2) 


S,? = 2,559,000 


There is little to choose between the r 
mates, as might be expected from the n 
techniques are greatly superior to the 


egression and the ratio esti- 
ature of the variables. Both 
mean per unit. 


7.6 The regression estimate in Stratified sampling. As with the ratio 
estimate, there are two types of regression estimate that can be made 
in stratified random sampling. For the first estimate, Tirs (s for 
separate), we compute a separate regression coefficient a each stra- 
oo This estimate is appropriate to a mathematical model of the 
orm 


vas = Yn + Balas — Xa) + en: (7.20) 


7.6 THE REGRESSION ESTIMATE IN STRATIFIED SAMPLING 151 


where as usual h denotes the stratum and 7 the observation within 
the stratum, and where we believe that B} varies from one stratum to 


another. 
We first compute the regression estimate of each stratum mean, i.e. 


Tarn = Gn + On(Xn — Fr) 


where by, is the least squares estimate of Ba. Then 


L 
D Nilin 
hoi 


7 (7.21) 


Dirs = 
There are two types of approach to the sampling theory for firs, 
corresponding to the two approaches made with simple random sam- 
pling. On the one hand we may assume that the population size in 
each stratum is infinite and that the regression really is linear within 
each stratum, so that the results of standard least squares theory may 
be applied. These assumptions are not too unrealistic for some ap- 
plications (e.g. in agricultural sampling). On the other hand there is 
the large-sample theory (as in section 7.2) which does not assume an 
infinite population or a linear regression. Since both results may be 
useful on occasion, two versions of the theorem for V (girs) will be 
given. The elementary theory will be presented first. 


Theorem 7.3a Suppose that each stratum may be regarded as in- 
finite and that 
ynij = Yn + Balas — Xn) + enri 


where for any fixed ra; 
E(eniz) =0: E(enij”) = Sen? 


Then, with stratified random sampling, firs is an unbiased estimate of 
Y, with variance 


LANN 1 E- Xn)? 
V (firs) = (Œ) swa- > [Ea] (7.22) 
ee Py ye Ose mp Dy (tas — Sn)? 
Proof: Applying theorem 7.la to stratum h, we deduce that Gr is 
an unbiased estimate of Y», with variance 
1 (En — Xn)? 
a = eH = 2 {= + 
V(Gtrn) = S (1 — pr) nt S Ge a 
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Since L Nn 
Tire = Do — Tun 
naa N 


it follows that Fıs is an unbiased estimate of Y and that its variance 
is as given in (7.22). 


Theorem 7.3 If the sample is large in every stratum, 
E Ni(Nn — m) 
V@irs) = 2 ——.—— 

i Pp Nn, 


Proof: In this version we do not assume the existence of a linear 
regression. As in section 7.2, we define 


Syn?(1 — pn?) 


Na ba > 
DX (uni — Yaar — Xn) 
= 

B, = 


Nn s 


DX (en — Xp)? 
i=l 


Similarly the residual variate, eri, is defined by the equation 
Yri = Yn + Balani — Xn) + eri 


The results of section 7.2 may now be applied to Jin. The bias in 
Yirn is of order 1/nq, and its variance is, approximately, 
= 4s. (Ne —%) 
Vira) “=. ——— Sl — py?) 
Ninn, 
Consequently, by the definition of Tirs, 
1/nx’, where n,’ is the smallest of the np. 
ent strata is independent, 


L 2 

S >) i Ni (Npn = Mn) 
Vr) =D (=*) Vam) “=. Naa = ma) 

Gir) 25 irs) het N?n, 


Corollary. If the samples are large in every stratum, an estimate 
of V(¥ire) which is practically unbiased is 


its bias is at most of order 
Since the sampling in differ- 


S (1 — px?) 


à E Na(Na — n) 
Gira) “2 Nn, fee (7.23) 


where 
I 


(m — 2) {= (Yas — Gn)? — by? 2 (me a 


This result also follows from the argument in section 7.2. 


Th 
Sy.ch = 
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The estimate girs suffers from the same difficulty as the correspond- 
ing ratio estimate; in that the bias may have the same sign in every 
stratum. If the strata are numerous, the ratio of the bias of Yrs to 
its standard error may become appreciable. Since, as shown in, sec- 
tion 7.4, the leading term in the bias comes from the quadratic regres- 
sion of yp; On £h; this danger is most acute when the relation between 
the variates approximates the quadratic rather than the linear type. 


7.7 The combined regression estimate. The second estimate, Fire 
(c for combined), is appropriate when B} is presumed to be the same 
in all strata. The model then becomes 


= Y + Bean: — X) + ens 


To compute fire, we first find 


Naan ae E Nan 

Yst WN : st =r a 
These are the usual estimates appropriate to stratified sampling. Then 
Tire = Fot + V(X — Far) (7.24) 


For b it is often satisfactory to take the customary combined estimate 
Lom 
XL (Uri — Ga) (az — Sr) 
b ts h=1 i=l (7.25) 


L ni 
E Dy n-i) 


h=1 i=1 


This is not in general the most precise estimate of B. The v variance of 
bh, the estimate in stratum h, is 


V(b) Seat! 
AR OET] 
25 (Eri — Zr)? 
where Syah? = Sm? (1 — px”). The most precise estimate of B is, 


theoretically, obtained by weighing each b} inversely as its variance. 
This will be found to give 
aes DX DY gyn: — Jn) (Eni — Fr) 
ja DD — Fs)? 


where ga = 1/Sy.2:2. This estimate reduces to (7.25) only if the re- 
sidual variances are the same in all strata. In practice, bop, cannot be 
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used, because we have to insert sample estimates of Syz, with a re- 
sultant loss of precision from errors in these estimates. These errors 
are small when the samples within strata are large, but in that event 
the sampling error of b makes only a negligible contribution to V (Fre). 
Consequently any improvement on the customary combined estimate 
of B will probably be small unless the total sample size is, say, less 


than 50, and there are large differences between the residual variances 
in different strata. 


In presenting the elementary form of the result for V (Ñire), we shall 
suppose that the sample regression coefficient, b’ say, is some weighted 
mean of the bj, where the weights depend only on the Tri. Such a 
function includes, as particular cases, both the customary combined 
b and bope, and enables V (Gre) to be stated slightly more generally. 


Theorem 7.4a Suppose that each stratum may be regarded as in- 
finite and that 


nis = Yn + Bea: — Xa) + Cnsj (7.26) 
where, for any fixed This 
E(enri;j) = 0: Eleni?) = Ser? 
Then, with stratified random sampling, the estimate 
Tire = Jor + AEG By) 
is an unbiased estimate of Y with variance 
E INNE 
Tad =E (D) EA a + eu — DVO an 
a= \N/ m 
Proof: From (7.26) it follows that 
Gu = F + Bw — X) + & 
Hence the error of estimate 
Gre — Y = fur HIE — 3) ~ Y 
= ëu + (b — BY(X — ån) 


Under the conditions stated, it follows in the usual way that b’ is an 


unbiased estimate of B, so that Tire iS unbiased. Further, the covari- 
ance of ës and b’ may be shown to be zero. Hence 


VGire) = Ves) + (Fe — X) VO 


Ni\? 1 
=> i) sya = pi?) + (Eu — XV) 


h=1 
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Corollary 1 If 
xu 23 (Yni — Gn) (Eni — En) 


2 = (tri — Ep)? 
ae 
N mn 
bs (ni — Ep)? 


+ (Ea — X)? = z aes aT 


vv =b 


then 


L 
V (Gare) = È Swè — pn”) 
h=1 


(7.28) 


To prove this, we have 


Db, 2o eas — Fn)? 
h I 


E: AOE E 

D E Cas — 4)” 

Also a x $ 

uh L Pee 

Vn) = n (1 — pr) 


D (ni — &)? 


The result follows by applying the usual formula for the variance of a 
linear function. 

Corollary 2 There are various particular forms of this result, ac- 
cording to the type of allocation adopted. For instance, if Syl — pr’) 
is constant in all strata and proportional allocation is used, formula 


(7.28) becomes E 
(Gu — X)? 


1 
VGid = SQ =a Ee A (7.29) 
n 2 2 (Eni — Zr) 


With simple random sampling, the contribution of the sampling 
error of b to V (gı) was found to be approximately 1/n times the total 
variance. Unfortunately this result does not always hold for V (fire). 
For equation (7.29) the result is valid, but in the more general expres- 
sion (7.28) it sometimes happens that the major contribution to the 
variance comes from a single stratum, say stratum h. An examination 
of formula (7.28), which will not be presented, shows that in this situ- 
ation the contribution from V(b) may be as large as 1/n, times the 
total variance. In samples of moderate size it is therefore advisable 
to check that the contribution of V (b) is negligible before discarding it. 

The more general theory for ire, in which the assumption of a linear 
regression is relaxed, becomes quite complicated. We shall carry it 
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only far enough to exhibit, in a general way, what happens. The 
within-stratum regression coefficients are defined as follows: 


> (yas — Yn) (tni — Xn) 
i=l 
B, = 


Ne 
2 (was — Xa) 

The residual variates e}; are defined by the equations 

Yni = Yn + Balas — Xn) + eni 
Hence, as deductions from this equation, 

tn = Yn + Brn — Xi) + & 

Ju = Y + DO WiBsl@n — Xp) + ex (7.30) 
where W, = N;,/N. Now 

Tire = Yor + b'(X — ta) 
Substitute for ğa: from equation (7.30). Hence the error of estimate is 
Jire — Y = X WaBul@n — Xa) HOX — Sys) + èn 


At this point it is convenient to introduce the symbol B’ = E(b'’), 
The previous expression may be written as 


Tre — Y = $ Wi Baldy — Xr) — Bi(%, — X) 
= G — BYG —X)+ Ēst 


Eu + D Wal, — Xr) (Br — BY) — (b' — B)(@,, — X) 
since > Wit, = fu, and >) WiX, = X. 


This analysis divides the error of esti 


mate into three components. 
The first is the familiar term é,,. 


The second arises from any varia- 
tion, from one stratum to another, in the true within-stratum regres- 


sion coefficients B}. If the B, are all equal to B, this term vanishes 
provided that b’ is an unbiased estimate of this B. Since E(t) = X, 
this term does not introduce any bias into Tiro 
to the variance of Ñr. 

The third term represents the contri 
b’. As in simple random sampling, 
zero unless the regression is neti 


but it does contribute 


ibution of the sampling error of 
the mean value of this term is not 
lally linear: the leading term in the 
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bias comes from the quadratic regression of Yar On Tai If the varia- 
bility is approximately the same in all strata and proportional sam- 
pling is used, the bias is of order 1/n, but it may be larger if the con- 
tribution from one stratum is dominating. The same remarks apply 
to the contribution from V(b’) 

If the bias and variability arising from the term in b’ can be ignored, 
the leading part of V (fire) is 


Ni(Nn — n 


Vlr) = 20 à Sml — px”) + Sa? (Ba — B’)?] 


h N?N 


7.8 Estimated variances. In V(girs), we substitute for Siar = 
Syr? (1 — px?) the sample estimate 


iT 
Sya = ao x (ni — Jn) — balani — 2) }? 


__ 1 ye _ DE Oni — a) ai = ,)P 
a [z (Yai — Fa) S aea 


With the combined estimate, sy.z,” may be taken as 


1 
sya = ——— D {lni — Jn) — blatni — n)}? 
(m, — 1) 
The divisor (na — 1) is suggested, instead of (n, — 2), because a com- 
mon b has been employed in all strata. [As an “intuitive” approxima- 


F v 1 : z 
tion, the divisor Q -1- 3) might seem better.] To avoid com- 
puting the deviations, the numerator of Sy.zh> may be calculated as 


DX (yas — Gn)? — 2 E (ns — Ga) (as — Fa) + BD (as — Hi)? 


It is advisable to insert the individual values of 8y.ch- in their re- 
spective strata, rather than attempt any pooling, unless there is good 
evidence that Samne does not vary from stratum to stratum. 


7.9 Comparison of the two types of regression estimate. Hard and 
fast rules cannot be given to decide whether the separate or the com- 
bined estimate is better in any specific situation: some exercise of 
judgment is required in making a choice. The defects ofi the separate 
estimate are that it is more liable to bias when samples are small 
within the individual strata, and that its variance has a larger contri- 
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bution from sampling errors in the regression coefficients. The defect 
of the combined estimate is that its variance is inflated if the popula- 
tion regression coefficients differ from stratum to stratum. 

If we are confident that the regressions are linear and if B, appears 
to be the same in all strata, so far as can be judged, the combined es- 
timate is to be preferred. If the customary combined regression b has 
been used for B, the sample estimate of V (Fire) is obtained by substi- 
tuting the quantities s,.z}? into formula (7.28). 

If the regressions appear linear (so that the danger of bias seems 
small) but B, seems to vary from stratum to stratum, the separate 
estimate is advisable. A sample estimate of its variance is obtained 
by substituting the values Sy.zh” into formula (7.22). 

If there is some curvilinearity in the regressions when a linear re- 
gression estimate is used, the combined estimate is probably safer 
unless the samples are large in all strata. 


7.10 Exercises. 
7.1 A population contains 6 units, with the following values of 1 i and zi: 


Unit 


By working out all possible cases, compare the precisions of the ratio and 
linear regression estimates for simple random samples of size 2. Compute 
the contributions of the bias to the variances. 

7.2 From the sample data in table 6.1 (p. 113) compute the regression 
estimate of the 1930 total number of inhabitants in the 196 large cities. Find 
the standard error of this estimate, and compare its precision with that of 
the ratio estimate. ‘ 

7.3 In the previous exercise, find the estimated total number of inhabi- 
tants, and its standard error, if b is arbitrarily taken as 1, 

7.4 By working out all possible cases, compare the precisions of the sep- 
arate and combined regression estimates of the total Y of the following popu- 
lation, when simple random samples of size 2 are drawn from each stratum: 


Stratum 1 Stratum 2 
Tii Yii Tzi Yai 
4 0 5 7 
6 3 6 12 
7 5 8 13 


Use the ordinary least Squares estimates of the B’s, formula (7.25) for be. 
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CHAPTER 8 
SYSTEMATIC SAMPLING 


8.1 Description. This method of sampling is at first sight quite dif- 
ferent from simple random sampling. Suppose that the N units in 
the population are numbered from 1 to N in some order. To select 
a sample of n units, we take a unit at random from the first k units, 
and every kth subsequent unit. For instance, if k is 15 and if the first 
unit drawn is number 13, the subsequent units are numbers 28, 43, 
58, and so on. The selection of the first unit determines the whole 
sample. This type of sample will be called an every kth systematic 
sample. - 

The apparent advantages of this method over simple random sam- 
pling are as follows: 

i. It is easier to draw a sample and often easier to execute without 
mistakes. This is of particular advantage when the drawing is done 
in the field. Even when drawing is done in an office there may be a 
substantial saving in time. For instance, if the units are described 
on cards which are all of the same size and lie in a file drawer, a card 
can be drawn out every inch along the file as measured by a ruler. 
This operation is speedy, whereas simple random sampling would be 
slow. Of course, this method departs slightly from the strict “every 
kth” rule. 

ii. Intuitively, systematic sampling seems likely to be more pre- 
cise than simple random sampling. In effect, it stra’ 
tion into n strata, which consist of the first k units, t 
and so on. We might therefore expect the system 
about as precise as the corresponding stratified ra 
one unit per stratum. The difference is that with t 
ple the units all occur at the same relative positi 
whereas with the stratified random sample the posi 
is determined separately by randomization within 
figure 8.1). The systematic sample is spread mo 
population, and this fact has sometimes made 
considerably more precise than stratified random 

One variant of the systematic sample is to ch 
near the center of the stratum; that is, mstead of 
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ndom sample with 
he systematic sam- 
on in the stratum, 
tion in the stratum 
each stratum (see 
re evenly over the 
systematic sampling 
sampling. 

oose each unit at or 
starting the sequence 
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by a random number chosen between 1 and k, we take the starting 
number as (k + 1)/2 if k is odd, and either k/2 or (k + 2)/2 if k is 
even. This procedure carries the idea of systematic sampling to its 
logical conclusion. If y; can be considered a continuous function of a 
continuous variable 7, there are grounds for expecting that this cen- 
trally located sample will be more precise than a randomly located 
one. Little investigation of the efficacy of centrally located samples 


x = systematic sample © = stratified random sample 
ee ee lea Lo ee ee 
k 2k 3k 4k 5k 6k 


Unit number 


Ficure 8.1 Systematic and stratified random sampling. 


has been made for the types of population usually encountered in 
sample surveys, and attention will be confined to randomly located 
samples. a 

Since N is not in general an integral multiple of k, different syste- 
matic samples from the same finite population may vary by one unit 
in size. Thus with N = 23, k = 5, the numbers of the units in the 
five systematic samples are as shown in table 8.1. The first three 


TABLE 8.1 THE POSSIBLE SYSTEMATIC SAMPLES FOR N = 23, k = 5. 


Systematic sample number 


I II DE a LY A 
1 2 3 4 
6 T 8 9 
11 12 13 14 1 
16 17 18 19 
21 22 23 


Sasa 


samples have n = 5, while the last two have n = 4. This fact in- 
troduces a disturbance into the theory of systematic sampling. The 
disturbance is probably negligible if n exceeds 50, and will be ignored, 
for simplicity, in the presentation of theory. The disturbance is un- 
likely to be large even when 7 is small. i 


8.2 An alternative view. There is another way of looking at syste- 
matic sampling. With N = nk, the k possible systematic samples are 
shown in the columns of table 8.2. It is evident from this table that 
the population has been divided into k large sampling units, each of 
which contains n of the original units. The operation of choosing a 
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randomly located systematic sample is just the operation of choosing 
one of these large sampling units at random. Thus systematic sam- 
pling amounts essentially to the selection of a single complex sampling 
unit which constitutes the whole sample. In other words, systematic 
sampling is actually simple random sampling, applied to a set of k 
large units, with the restriction that in terms of these large units the 
sample size is 1. 


TABLE 8.2 Composrrion OF THE k SYSTEMATIC SAMPLES 


Sample number 


1 2 a A s 7 
yı ye Yi Yk 
Yk+1 Yk+2 Yk+i Yok 


Yin—=Dk+1 Yn—Ik+2 Yin-k+i  Ynk 


Means j De vi Tk 


—<—<— eee O 
hich consists of a group or cluster of the 
S a common device. The next three chap- 


8.3 Variance of the estimated mean. Seve 
developed for the variance of Tay; 
The first, three given below apply 
which the clusters all contain n el 
one cluster, 

If N = nk, it is easy to verify that 
for a randomly located Systematic sa) 
not hold, although the bias is unli 
can be avoided by allotting a high 
tain samples. Consider the exampl 
selection y is given to each of the first three samples, and a prob- 
ability 4; to each of the last two, the sample mean is unbiased. 

In the following analysis, the symbol y;; denotes the jth member of 
the dth systematic sample, so that j = 1, 2, ..., n i=l, 2, -k 
The mean of the ¿th sample is denoted by fi.. 


ral formulas have been 
the mean of a systematic sample. 
to any kind of cluster sampling in 
ements and the sample consists of 


Ūsu is an unbiased estimate of Y 
mple. If N + nk, this result does 
kely to be important. The bias 
er probability of selection to cer- 
e in table 8.1. Ifa probability of 


Sa a a 
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Theorem 8.1 The variance of the mean of a systematic sample is 


“(N- k(n — 1) 
Vey) = D g — Bor 8.1 
(Gsu) N V v (8.1) 
where 
Sipe Vii — ie. 
: eesti. re 


is the variance among units which lie within the same systematic 
sample. The denominator of this variance, k(n — 1), is constructed 
by the usual rules in the analysis of variance: each of the k samples 
contributes (n — 1) degrees of freedom to the sum of squares in the 
numerator. i 


Proof: By the usual identity of the analysis of variance 
(W = DS? = OD wa — ¥)® 
oar 
=n DS G: — Y) + 2 >F Ua — H-)? 
i od 
But the variance of jy is by ine 


Vgs) = = >> G= 


kin 


Hence, 
N — 1)S? = nkV Gey) + hin — 1)Swoy? 
The result follows. 


Corollary. The mean of a systematic sample is more precise than 
the mean of a simple random sample if and only if 


Siege ase (8.2) 
Proof: If ĝis the mean of a simple random sample of size n, 
(N — n) S? 
ar aha 
From equation (8.1), V(jsy) < V(%) if and only if 
Ne" k(n — 1 kK N — n) 8? 
( z Kosi 3 Pe a E: 


va) = 


ie. if ; 
k(n — 1)Swsy > [x 


z "| S? = k(n — 1)? 


This important result, which applies to cluster sampling in general, 
states that systematic sampling is more precise than simple random 
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sampling if the variance within the systematic samples is larger than 
the population variance as a whole. Systematic sampling is precise 
when units within the same sample are heterogeneous, and is impre- 
cise when they are homogeneous. The result is obvious intuitively. 
If there is little variation within a systematic sample relative to that 
in the population, the successive units in the sample are repeating 
more or less the same information. 
Another form for the variance is given in theorem 8.2, 


Theorem 8.2 
(Nh) 


7 S? 
V@Gey) = =| N 


2 Ge Dae] (8.3) 


where 


2 k = = 
= me Yi — Y) (yin — Y 

ia Le z È Ua — Y)(y ) 

This quantity may be described as the correlation coefficient between 

pairs of units that are in the same systematic sample. The divisor 


factor kn(n — 1)/2 is the number of distinct terms in the sum of 
products. 


Proof: 


k 
WEV sy) = n? D (i — Y)? 
i=l 


k 


[ua — Y) + wa — F) +.. (Yen = YI? 


t=1 


The squared terms amount to the total sum of squares of deviations 
from Y, i.e. to (N — 1)S?. This gives 


mV Gm) = (N = D +2D D Gos — Pion — P) 
i j<u 
= (N — 1)S? + kn(n — 1)S? pu 
S? [e = 1) 


V (tsu) = Po Ea F — Doe 


Hence, 


This result shows that positive correlation between units in the 
same sample inflates the variance of the sample mean. Even a small 
positive correlation may have a large effect, because of the multiplier 
(n — 1). 

The two previous theorems express V (Fay) in terms of 5 


i f 3 and hence 
relate it to the variance for a simple random sample. 


There is an 
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analogue of theorem 8.2 which expresses V(,,) in terms of the vari- 
ance for a stratified random sample in which the strata are composed 
of the first k units, the second k units, and so on. In our notation the 
subscript j in y;; denotes the stratum. The stratum mean will’ be 
written J-j- 


Theorem 8.3 


Sus? [(N — 
V(Geu) = = am (n= Doen | (8.4) 


where è 


Ss 2 Ü. 2 

wst = =) aad x Wi — 9-3) 

This is the variance among units that lie in the same stratum. The 
divisor n(k — 1) is used because each of the n strata contributes 
(k — 1) degrees of freedom. Further 


Pwst = —— > D (Wij — 9-5) Yiu — G-u)/Swt” (8.5) 
kn(n — 1) = — 1) F jzu 
This rather complex quantity is the correlation coefficient between the 
deviations from the stratum means of pairs of items that are in the 
same systematic sample. 

The proof is similar to that of theorem 8.2. 

Corollary. A systematic sample has the same precision as the corre- 
sponding stratified random sample, with one unit per stratum, if 
Pwst = 0. This follows because for this type of stratified random sam- 
ple V(ğs:) is (theorem 5.3, corollary 2) 


N — a) Swot” 
N 


Other formulas for V(jsy), appropriate to an autocorrelated popu- 
lation, have been given by W. G. and L. H. Madow (1944), who made 
the first theoretical study of the precision of systematic sampling. 

Example. The data in table 8.3 are for a small artificial popula- 
tion which exhibits a fairly steady rising trend. We have N = 40, 
k = 10, n = 4. Each column represents a systematic sample, and 
the rows are the strata. The example illustrates the situation in 
which the “within-stratum”’ correlation is positive. For instance, in 
the first sample each of the four numbers 0, 6, 18, and 26 lies below 
the mean of the stratum to which it belongs. This is consistently 
true, with a few exceptions, in the first five systematic samples. In 
the last five samples, deviations from the strata means are in most 


V@st) = ( 


n 
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cases positive. Thus the cross-product terms in pws: are predomi- 
nantly positive. From theorem 8.3 we should expect systematic 
sampling to be less precise than stratified random sampling with one 
unit per stratum. 


TABLE 8.3 DATA FOR 10 SYSTEMATIC SAMPLES WITH n = 4,N = kn = 40 


Systematic sample numbers Strata 


Smal yo e 4 s 6 y's o 1 | creas 


ul 4 

9 12 

HI 18 19 20 20 24 23 25 28 29 27 23. 
31 33 


Totals | 50 58 61 63 75 71 82 88 91 88 72.7 


The variance V(g,y) is found directly from the systematic sample 
totals ar 


iPM i. = 
V (Fan) = Voy = k x (ji. — Y)? = nk x (ng. — nY)? 
= | (so)2 EO (727)?) _ 
= a + (58)? +--+ (88)? — | = 11.63 


For random and stratified random sampling, we need an analysis 
of variance of the population into “between rows” and “within rows.” 
This is presented in table 8.4. Hence the variances of the estimated 


TABLE 8.4 ANALYSIS OF VARIANCE 


df 88 ms 
Between rows (strata) 3 4828.3 
Within strata 36 485.5 13.49 = So? 
Totals 39 5313.8 136.25 = S? 


means from simple random and stratified random samples are as 


follows: 
v C = *) S? 9. 136.25 
ran N eto re 30.66 
7 < - a) Ss 9 13.49 
st N AD ee 3.04 


8.4 COMPARISON WITH STRATIFIED RANDOM SAMPLING 167 


Both stratified random sampling and systematic sampling are much 
more effective than simple random sampling, but, as anticipated, sys- 
tematic sampling is less precise than stratified random sampling. 

Table 8.5 shows the same data, with the order of the observations 
reversed in the second and fourth strata. This has the effect of mak- 
ing Pws: negative, because it makes the majority of the cross-products 
between deviations from the strata means negative for pairs of obser- 
vations that lie in the same systematic sample. In the first systematic 


TABLE 8.5 Dara IN TABLE 8.3, WITH THE ORDER REVERSED IN STRATA II AND IV 


Strat: Systematic sample numbers Strata 
SEMN ly a Seda OaS 9) 10\ | eaa 
I Ot ft 2k 2 Oe OF BE OS 4.1 

Il 17 16 16 15 12 13 10 9 8 6 12.2 
III 18 19 20 20 24 23 25 28 29 27 23.3 
IV 38 38 37 35 32 33 31 31 30 26 33.1 
Totals | 73 74 74 72 73 73 73 75 75 65 72.7 


sample, for instance, the deviations from the strata means are now 
—4.1, +4.8, —5.3, +4.9. Of the six products of pairs of deviations, 
four are negative. Roughly the same situation applies in every sys- 
tematic sample. 

This change does not affect V,an and Vy. With systematic sam- 
pling, it brings about a dramatic increase in precision, as is seen when 
the systematic sample totals in table 8.5 are compared with those in 
table 8.3. We now have 

2 
(727) | ears 
10 


It is sometimes possible to exploit this result by numbering the units 
so as to create negative correlations within strata. Accurate knowledge 
of the trends within the population is required. However, as will be 
seen later, the situation in table 8.5 is one in which it is very difficult 
to obtain from the sample a good estimate of the standard error of 
Tsu 


= ey 2 2 Eee. ten 
Vig = |) + (74)? +- -+-+ (65) 


8.4 Comparison of systematic with stratified random sampling. The 
performance of systematic sampling relative to that of stratified or 
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simple random sampling is greatly dependent on the properties of the 
population. There are populations for which systematic sampling is 
extremely precise and others for which it is less precise than simple 
random sampling. For some populations and some values of n, 
V(sy) may even increase when a larger sample is taken—a startling 
departure from good behavior. Thus it is difficult to give general ad- 
vice about the situations in which systematic sampling is to be recom- 
mended. A knowlédge of the structure of the population is necessary 
for its most effective use. 

Two lines of research on this problem have been followed. One is 
to compare the different types of sampling on artificial populations in 
which y; is some simple function of 7. The other is to make the com- 
parisons for natural populations. Both types of investigation are la- 
borious and are not yet as extensive as an advisor on sampling would 
wish, assuming that he did not have to do the work. Some of the prin- 
cipal results are presented in the succeeding sections. 


8.5 Populations in “random” order. Systematic sampling is some- 
times used, for its convenience, in populations where the numbering 
of the units is effectively random. This is so in sampling from a file 
arranged alphabetically by surnames, if the item that is being meas- 
ured has no relation to the surname of the individual. There will then 
be no trend or stratification in y; as we proceed along the file, and no 
correlation between neighboring values. 

In this situation we would expect systematic sampling to be essen- 
tially equivalent to simple random sampling and to have the same 
variance. For any single finite population, with given values of n 
and k, this is not exactly true, because V,,, which is based on only k 
degrees of freedom, is rather erratic when k is small, and may turn out 
to be either greater or smaller than V,an. There are two results which 
show that on the average the two variances are equal. Both results 


will be reported, since they illustrate different approaches to the study 
of systematic sampling. 


Theorem 8.4 Consider all N! finite populations which are formed 
by the N! permutations of any set of numbers y;, Ye, ***, yn. Then, 
on the average over these finite populations, 


E(Vay) = Vian 
Note that V;an is the same for all permutations. 


This result, which was proved by W. G. and L. H. Madow (1944), 
shows that, if the order of the items in a specific finite population can 


be regarded as drawn at random from the N! permutations, syste- 
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matic sampling is on the average equivalent to simple random sam- 
pling. 

The second approach is to regard the finite population as drawn at 
random from an infinite super-population which has certain proper- 
ties. The result that is proved does not apply to any single finite 
population (i.e. to any specific set of values y1, Y2, `++, yy) but to the 
average of all finite populations that can be drawn from the infinite 
population. This approach may appear at first sight to have little 
relation to practical applications, but this impression is erroneous. 
Any sampling method is used in practice on a whole series of finite 
populations. One way of describing the class of finite populations for 
which a given sampling method is efficient is to describe the infinite 
super-population from which such finite populations might have been 
drawn at random. 

The symbol ¢ denotes averages over all finite populations which can 
be drawn from this super-population. 


Theorem 8.5 If the variates y; (i = 1, 2, ---, N) are drawn at 
random from a super-population in which 


yi= p: ys — uu; — u) =0 (i#j) ey — u)? = o? 
then 
EV sy = EV ran 


The crucial conditions are that all y; have the same mean q, i.e. there 
is no trend, and that no linear correlation exists between the value’ 
Yi and yj at two different points. The variance o; may change from 
point to point in the series. 

Proof: For any specific finite population, 


N 
.— Vy)? 
won Y) 


Vran = =o 
Nn (N — 1) 
Now 


N be N ES 
È m- Y =F [u - u) - Y -aP 
i=l 


i=l 
N ve 
=}, (vi — u)? — N(Y — p)? 
i=l 


Since y; and y; are uncorrelated (i # J), 
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Hence ‘ss N 2 
om = a9 Fay Eel 
Nn(N — 1) aa N 
This gives 
v= (N — n) > 2 
aoe Nn i 


Turning to V,,, let Ju denote the mean of the uth systematic sample. 
For any specific finite population, 


Vind See 
wT Yu 


k 
=7{> (ou = 0)? = MP = ny} 
lust 


By the theorem for the variance of the mean of an uncorrelated 
sample from an infinite population, 


N N 
et ok Sot 
i=l a 


1 


Vey = Fi 


n? N? 
Dee oye 


N’n ia 


oit = EV ran 


8.6 Populations with linear trend. If the population consists solely 
of a linear trend, as illustrated ïn figure 8.2, it is fairly easy to guess 
the nature of the results. From figure 8.2, it looks as if V,, and Vy 
(with one unit per stratum) will both be smaller than V,an. Further, 


Ji 


x =systematic sample 


© =stratified random sample ad 
Da 


x i 


Ticure 8.2 Systematic sampling in a population with linear trend. 
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V, will be larger than Vs, because, if the systematic sample is too 
low in one stratum, it is too low in all strata, whereas stratified ran- 
dom sampling gives an opportunity for within-stratum errors to cancel. 

To examine the effects mathematically, we may assume that y; = 7. 
We have 


NW +), Sas et Dee 
ial 2 ee 6 


The population variance S? is given by 
= a yZ- 
1 psa D(2N +1) NWN + a _NN+D 
(N — 1) 6 4 12 
Hence the variance of the mean of a simple random sample is 
(N — n) nk 1) _nk(N + 1) = (k — 1)(N + 1) 
N n nk 12n 12 


(8.6) 


Vran = 
(8.7) 


To find the variance within strata, Sw, we need only replace N by 
k in (8.6). This gives 
_W-n) Sun _ nlk- 1) kk+1) | (kK? — 1) 


fc ee IS) 


Vat 
N n nk 12n 12n 


For systematic sampling, the mean of the second sample exceeds 
that of the first by 1, while the mean of the third exceeds that of the 
second by 1, and so on. Thus the means ğu may be replaced by the 


numbers 1, 2, +-+, k. Hence, by a further application of (8.6), 
be K(k? — 1) 
¢. — Y= 
x a ) B 
This gives 
il (k? — 1) 
Vey =~ ju — Y)? = ———— 8.9 
= La ) 5 (8.9) 


From the formulas (8.7), (8.8), and (8.9) we deduce, as anticipated, 


B-1 B-1 (k—- (N+ 1) 
Va = < Vay = Vit ee eee 
2 babel One tae 12 


ran 
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Equality occurs only when n = 1. Thus, for removing the effect 
of a linear trend, suspected or unsuspected, the systematic sample is 
much more effective than the simple random sample, but less effective 
than the stratified random sample. 


8.7 End corrections. The poor performance of systematic sampling 
in the presence of a linear trend can be improved in several ways. 
One is to use a centrally located sample. Another is to change the 
estimate from an unweighted to a weighted mean in which all in- 
ternal members of the sample receive the usual weight 1/n, but the 
first and the last members receive weights that are in general differ- 
ent from 1/n. Such weights are called end corrections. 

As before, we select the systematic sample by choosing a random 
number 7 between 1 and k. The weights assigned to the two end mem- 
bers depend on the value of ¿ which was chosen. In computing the 
weighted sample total, before division by n to obtain the weighted 
sample mean, the weights are as follows: 


2i-—k-1 
First member: 1+ lal a |] 


2(n — 1)k 


2i — k — 
Last member: = nes ke =i 


2(n — 1)k 


For any value of i, the two weights always add to 2. 
Example. Suppose that n = 4, k = 3, N = 12. The weighted 
means for the three possible systematic samples are as follows: 


i] 
2h 


Tisy = £081 + ya + y7 + Lyo) 
Tusu = 4(y2 + Ys + Ys + y) 
Guey = FOY + Ye + yo + $12) 


i 
i 
i= 

A crude rationalization of the system of weighting is that, when 
i = 1, the first member of the sample, yı, receives a reduced weight 
because it is at one end of the finite population; 410, on the other 
hand, receives an increased weight because it “represents” the obser- 
vations Yo, Y11, and yı2 which are nearer to it than to any other mem- 
ber of this sample. End corrections are analogous to the coefficients 
2 which are assigned to the two end terms in the Euler-Maclaurin 
formula for numerical integration. 

It is easily shown that, in a 


population which consists solely of a 
linear trend, Jwsy alw: 


ays gives the correct population mean. Thus 
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V(Gwsy) = 0. Let the linear population be represented as before by 
y; = j, and consider the systematic sample which starts at 7. 


3 n(2i — k — 1) 
NYwoy =t A E k C a Ea ai 
+ fi + (n — 2k} + {i+ (n — 1k} -EP 
. „tmnm = 1) n(2i — k — 1) 
ni + z — > 
n(kn + 1) 
opi 


Hence Fwsy = (kn + 1)/2 = Y, irrespective of the value of 7. 

These end corrections completely remove the effect of any linear 
trend in a population. In actual populations more complex types of 
trend may be present, as well as “random” variations that are inde- 
pendent from one member of the series to another. So far as the in- 
dependent variations are concerned, end corrections result in a slight 
loss of precision, for, if the y; vary independently with the same vari- 


ance S°, we have 
V@Gwsy) = S? = wj? 
J 


where w; is the weight attached to any y;. With Ẹsy, the unweighted 
mean, >> w; = 1/n. With Jusy, 2. wj? depends on the starting mem- 
ber 7 of the sample. The average value of >> w;?, taken over the range 
i = 1,2, ---, k, is found to be 


fi + ae. 


n 6(n — 1)°k? 


The inflation of the variance is negligible except for small n. For 
n = 10, the inflation factor is at most about 2 per cent. 
End corrections were first proposed by Yates (1948), who assigned 
slightly different weights 
(2¢ — k — 1) 
2k 


1+ 


to the first and last members. These differ from the weights given 
previously only by a factor (n — 1)/n. In tests of the efficacy of his 
end corrections in five natural populations (described in table 8.6) 
Yates found a worth-while increase in precision in four of the five cases. 
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8.8 Populations with periodic variation. If the population consists of 
a periodic trend, e.g. a simple sine curve, the effectiveness of the sys- 
tematic sample depends on the value of k. This may be seen pictori- 
ally in figure 8.3. In this representation, the height of the curve is the 


Fıcure 8.3 Periodic variation. 


observation y;. The sample points A represent the case least favor- 
able to the systematic sample. This case holds whenever k ig equal 
to the period of the sine curve or is an integral multiple of the period. 
Every observation within the systematic sample is exactly the same, 
so that the sample is no more precise than a single observation taken 
at random from the population. 

The most favorable case (sample B) occurs when k is an odd mul- 
tiple of the half-period. Every systematic sample has a mean exactly 
equal to the true population mean, since successive deviations above 
and below the middle line cancel. The sampling variance of the mean 
is therefore zero. Between these two cases the sample has various 


degrees of effectiveness, depending on the relation between k and the 
wavelength. 


8.9 Autocorrelated populations. With many natural populations, 
there is reason to expect that two observations y;, y; will be more 
nearly alike when 7 and j are close together in the series than when 
they are distant. This happens whenever natural forces induce a 
slow change as we proceed along the series. In a mathematical model 
for this effect, we may suppose that Yi and y; are positively correlated, 
the correlation between them being a function solely of their distance 
apart, č — j, and diminishing as this distance increases, Although this 
model is oversimplified, it may represent one of the salient features of 
many natural populations. 

In order to investigate whether this model doi 
tion, we can calculate the set of correlations pu 
tant u units apart, and plot this correlation against u. This curve, 
or the function which it represents, 


1 is called a correlogram. Even if 
the model is valid, the correlogram will not be a smooth function for 


any finite population, because irregularities are introduced by the 
finite nature of the population. In a comparison of systematic with 
stratified random sampling for this model, these irregularities make it 


es apply to a popula- 
for items that are dis- 
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difficult to derive results for any single finite population. The compari- 
son can be made over the average of a whole series of finite populations, 
which are drawn at random from an infinite super-population to which 
the model applies. This technique has already been applied in theorem 
8.5. 

Thus we assume that the observations y; (i = 1, 2, ---, N) are 
drawn from a super-population in which 


ey) =m, elyi — u)? = 0°, elyi — u)(Yipu — u) = puo? (8.10) 


where 
Pu È pv > 0, whenever u < v 
The drawing of one set of y; from this super-population creates a sin- 
gle finite population of size N. 
The average variance for systematic sampling is denoted by 


Vey = Elsy am Y)? 


For this class of populations it is easy to show that stratified ran- 
dom sampling is superior to simple random sampling, but no general 
result can be established about systematic sampling. Within the class 
there are super-populations in which systematic sampling is superior 
to stratified random sampling, but there are also super-populations 
in which systematic sampling is inferior to simple random sampling 
for certain values of k. 

A general theorem can be obtained if it is further assumed that the 
correlogram is concave upwards. 

Theorem 8.6 If, in addition to conditions (8.10), we have 


67 = piyi — Pi 2p: 2 0 [i= 2,3, ++, (kn — 2)] 


then 
Vey < Vst S Vran 


for any size of sample. Further, unless ô? = 0,7 = 2,3, +++, (kn — 2), 
Vey < Var 


A proof has been given by Cochran (1946) and will not be repeated 


here. 
Quenouille (1949) has shown that the inequalities in theorem 8.6 
remain valid when two of the conditions are relaxed so that 


elyi) = pi (Ys — ui)? = or? 


In this event each of the three average variances is increased by the 
same amount. 
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So far as practical applications are concerned, correlograms which 
are concave upwards have been proposed by several writers as models 
for specific natural populations. The function p, = tanh (u~) was 
suggested by Fisher and Mackenzie (1922) for the correlation be- 
tween the weekly rainfall at two weather stations which are distant 
u apart, the function pu = e™™ by Osborne (1942) for forestry and 
land use surveys, and the function pu = (l — u)/l by Wold (1938) for 
certain types of economic time series. 


8.10 Natural populations. Investigations have been made on a vari- 
ety of natural populations. The data are described in table 8.6. The 


TABLE 8.6 NATURAL POPULATIONS USED IN STUDIES OF SYSTEMATIC SAMPLING 


Reference N Type of data 

Yates (1948), 288 Altitudes read at intervals of 0.1 mile from ordnance 
table 13 survey map. 

Osborne (1942) > Per cent of area in (i) cultivated land, (ii) shrub, 


Gii) grass, (iv) woodland on parallel lines drawn 
on a cover-type map. 


Osborne (1942) * Per cent of area in Douglas fir on parallel lines drawn 
on & cover-type map. 

Yates (1948) 192 Soil temperature (12 in. under grass) for 192 consecu- 

s tive days. 

Yates (1948) 192 Soil temperature (4 in. under bare soil) for 192 days. 

Yates (1948) 192 Air temperature for 192 days. 

Yates (1948) 96 Yields of 96 rows of potatoes. 

Finney (1948) 160 Volume of salable timber per strip, 3 chains wide 
and of varying length (Mt. Stuart forest). 

Finney (1948) 288 Volume of virgin timber per strip, 2.5 chains wide, 
80 chains long (Black’s Mountain forest). 

Finney (1950) 292 Volume of timber per strip, 2 chains wide and of 


varying length (Dehra Dun forest). 

Johnson (1943) 400 + Number of seedlings per 1-ft-bed-width in 4 beds of 
hardwood seedbed stock. 

Johnson (1948) 400 ł Number of seedlings per 1-ft-bed-width in 3 beds of 
coniferous seedbed stock. 

Johnson (1943) 400 f Number of seedlings per 1-ft-bed-width in 6 beds of 
coniferous transplant stock. 


* Theoretically, N is infinite, if lines that are infinitely thin can be envisaged. 
t Approximately. The number varied from bed to bed. 


first three studies were made from maps. In the first study, the finite 
population consists of 288 altitudes at successive distances of 0.1 mile 
in undulating country. In the next two, the data are the fractions of 
the lengths of lines drawn on a cover-type map that lie in a certain 
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type of cover (e.g. grass). These examples might be considered the 
closest to continuous variation in the mathematical sense. 

The next three studies are based on temperatures for 192 consecutive 
days: (i) 12 in. under the soil, (ii) 4 in. under the soil, (iii) in air. 
This trio represents a gradation in the direction of greater influence of 
erratic day-to-day changes in the weather as compared with slow 
seasonal influences. 

The remaining studies deal with plant or tree yields in some se- 
quence that lies along a line. In the study on potatoes, which is typi- 
cal of the group, the finite population consists of the total yields of 96 
rows in a field. Since no exhaustive search of the literature has been 
made, further data may be available. 

In some of the studies, Vy is compared with the variance Vs:2 for a 
stratified random sample with strata of size 2k and 2 units per stratum. 
This comparison is of interest because an unbiased estimate of Vs: 
can be obtained from the sample data. This cannot be done for Vsx 
(with strata of size k and 1 unit per stratum) or for Vsy. Other writers 
report comparisons of Vsy with both Vs: and Vs2. The majority of 
the sources do not present comparisons with V,an in readily usable 
form, but it appears that in general V.: gave gains in precision over 
Vran: 

In the papers by Yates and Finney, comparisons are given for a 
range of values of n and k within each finite population. In these 
cases the data in table 8.7 are the geometric means of the variance 


TABLE 8.7 RELATIVE PRECISION OF SYSTEMATIC AND STRATIFIED 
RANDOM SAMPLING 
Relative precision of 
systematic to stratified 


Range se ee 
Data of k Vaa/Vsy Veto/Voy 
Altitudes 2-20 2.99 5.68 
Per cent area (4 cover types) Heh 4,42 
Per cent area (Douglas fir) Lanes 1.83 
Soil temperature (12 in.) 2-24 2.42 4.23 
Soil temperature (4 in.) 4-24 1.45 2.07 
Air temperature 4-24 1.26 1.65 
Potatoes 3-16 1.37 1.90 
Timber volume (Mt. Stuart) 2-32 1.07 1.35 
Timber volume (Black’s Mt.) 2-24 1.19 1.44 
Timber volume (Dehra Dun) 2-32 1.39 1.89 
Hardwood seedlings 14 Secs 1.89 
Coniferous seedlings 14-24 2.22 
Coniferous transplant 12-22 0.93 
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ratios for the individual values of k. The other writers make computa- 
tions for only one value of k per population, but may give data for 
different items or for several populations of the same natural type. 
Here, again, geometric means of the variance ratios were taken. 
Although the data are limited in extent, the results are impressive. 
In the studies which permit comparison with V,,1, systematic sam- 
pling shows a consistent gain in precision which, although modest, is 
worth having. The gains in comparison with Vee are substantial. 
The internal trend of the results agrees with expectations, although 
not too much should be made of this in view of the small number of 
studies. The gains are largest for the types of data in which we would 
guess that variation would be nearest to continuous, The decline in 
Vsu/Vsy from soil to air temperatures would also be anticipated from 
this viewpoint. In the last three items (forest nursery data), the only 


one showing no gain is coniferous transplant stock, which is older and 
more uniform than seedling stock. 


8.11 Quasi-periodic effects. In most of these studies, the variance 
ratios V,:/V., were reasonably stable for the different values of k 
that were examined. Exceptions are the Dehra Dun forest studied by 
Finney and one bed of hardwood seedling stock which has been studied 
by L. H. Madow (1946). (There may be more exceptions, because 
some studies included only one value of k.) 


In the exceptions, V, changed with varying sample size in a man- 
ner that suggests the 


presence of something approaching a hidden 
periodicity. Table 8.8 shows data presented by L. H. Madow. The 


TABLE 8.8 DATA SUGGESTING a QUASI-PERIODIC EFFECT 


k n Vey Vin Vran Viu/Vey 
42 10 4.21 7.21 10.29 aya 
20 21 3.06 3.00 4.77 0.98 
15 28 2.42 2.09 3.52 0.86 
14 30 0.69 1.90 3.26 2.75 
10 42 1.74 1.29 2.26 0.74 
T 60 0.26 0.82 1.51 3.15 
5 84 1.22 0.50 1.00 0.41 


bed was 420 ft long, and the unit was 1 ft of the bed width. 

When k is a multiple of 7, the precision of the systematic 
high relative to that of stratified sampling. For intermedia 
the precision is about the same or is lower. The erratic behavior of 
Vy with increasing sample size is another reflection of this phenome- 
non. Finney’s data (1950) exhibit a similar effect, 


sample is 
te values, 
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How frequently this effect occurs in natural populations is not 
known: it is presumably less likely in a long series (i.e. N large) than 
in a short series. It should be observed, however, that over the range 
of values of k from 5 to 42 in table 8.8, systematic sampling has done, 
on the whole, at least as well as stratified random sampling. The 
criticism is therefore that the performance of systematic sampling is 
unpredictable, not that it is uniformly poor. 


8.12 Estimation of the variance from a single sample. From the re- 
sults of a simple random sample with n > 1, we can calculate an un- 
biased estimate of the variance of the sample mean, the estimate be- 
ing unbiased whatever the form of the population. Since a systematic 
sample can be regarded as a simple random sample with n = 1, this 
useful property does not hold for the systematic sample. As an illus- 
tration, consider the “sine curve” example. Let 


yi = m + asin (ri/2) 


where k = 4 and i = 1, 2, ---, 4n. The successive observations in 
the population are 


(m + a), m, (m — a), m, (m + a), m, (m — a), m, ++ 


If i = 1 is chosen as the first member, all members of the systematic 
sample have the value (m + a). For the other three possible choices 
of i, all members have the values m, (m — a), or m, respectively. 
Thus from a single sample we have no means of finding out or estimat- 
ing the value of a. But the true sampling variance of the mean of the 
systematic sample is a?/2. The illustration shows that it is impossible 
to construct an estimated variance that is unbiased if periodic varia- 
tion is present. 

These results do not mean that nothing can be done. Excluding 
the case of periodic variation, we might know enough about the struc- 
ture of the population to be able to develop a mathematical model 
which adequately represents the type of variation that is present. 
We might then be able to manufacture a formula for the estimated 
variance that is approximately unbiased for this model, although it 
may be badly biased for other models. The decision to use one of 
these models must rest on the judgment of the sampler. Unfortu- 
nately, we frequently lack data, as distinct from opinions, about the 
structure of the population and are not confident that a given model 
will be satisfactory. 

Some simple models with their corresponding estimated variances 
are illustrated below. No proofs will be given. 
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The simplest models apply to populations in which y; is composed 
of a trend plus a “random” component. Thus 
Yi = mi te 
where u; is some function of i. For the random component, we assume 
that there is a super-population in which 
ee) =0: ee?) =o: dee) =0 (Gj) 
A proposed formula ss,” for the estimated variance is called unbiased if 
eE(ssy ) = Vay 


i.e. if it is unbiased over all finite populations that can be drawn from 
the super-population. 
I. Population in “random” order. 


ui = Constant (i = 1, 2, «++, N) 


5 N-n) È M- Tsu)? 
b SS 
Nn (n — 1) 


This case applies when we are confident that the order is essentially 
random with respect to the items being measured. The variance for- 


mula is the same as for a simple random sample and is unbiased if the 
model is correct. 


II. Stratification effects only. 
ui = Constant (rk+1<i<rk+k) 
2 _ N-n) Yo yi)? 
Nn 2(n — 1) 

In this case the mean is constant within each stratum of k units, The 
estimate 8,2”, which is based on the mean square successive differ- 
ence, is not unbiased. It contains an unwanted contribution from the 
difference between y’s in neighboring strata, and the first and last 
strata carry too little weight in estimating the random component of 
the variance. With a reasonably large sample, this estimate would in 


general be too high, assuming that the model is correct. 
III Linear trend. 


w= n+ B; 


Says? = N-n) n i — yiye + Yir)? 
i N w G(n — 2) 


The estimate is based on successive quadratic te 


Ssy2 


rms in the sequence 
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Yi- The sum of squares contains (n — 2) terms. With a linear trend 
we have seen (section 8.7) that the trend can be eliminated by the 
use of end corrections. The term n’/n? is the sum of squares of the 
weights in sy. Unless n is small, n’ /n? can be replaced by the usual 
factor 1/n. Because the strata at the ends receive too little weight, 
the estimate is not unbiased unless c; is constant, but it should be 
satisfactory if n is large and the model is correct. 

If continuous variation of a more complex type is present, the pre- 
ceding formulas may all give poor results. In table 8.9 the second 


TABLE 8.9 VARIANCES OF SAMPLE MEAN NUMBERS OF SEEDLINGS 
(JOHNSON’S DATA) 


Actual 

Bed Vy Sai Soyo 

Silver maple 1 0.91 2.8 2.5 
2 0.74 3.6 2.9 

American elm 1 4.8 28.4 12.6 
2 15.5 22.6 18.6 

White spruce 1 5.5 17.2 11.2 
2 2.0 11.6 6.4 

White pine 1 8.2 21.0 21.9 


and third formulas are applied to six forest nursery beds (Johnson, 
1943). The quadratic formula is slightly better than that based on 
successive differences, but both give serious overestimates. 

Various other formulas can be devised. Residuals from a fitted 
polynomial of higher degree may be effective if p; varies continuously 
and not too rapidly: tables have been provided by DeLury (1950) for 
this method. 

Formylas developed from simple assumptions about the nature of 
the correlogram have been discussed by Osborne (1942), Cochran 
(1946), and Matérn (1947). Yates (1949) has investigated an esti- 
mate based on a quantity of the form 


(Yu + Yupok + Yup +*+) — (Yuk + Yutak +: ‘) 


The successive items in the sample are given alternatively + and — 
signs. If this expression is taken over the whole sample, only 1 df 
is available. In order to provide more degrees of freedom, the sample 
data can be broken into parts, which Yates suggests might contain 9 
observations each. If we denote the successive observations in the 
systematic sample by y1’, Y2’, etc., and give weight } to the first and 
last terms, we may write 


dy = Gy + ys! + ys! + yz! + dy!) — (yo! + ys! + yo! + ys’) 
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The next difference, də, may start with ys’, and so on. Then, for the 
estimated variance of sy, we take 


N—n\ E d? 
2 = 
Siy ( Nn b> 7.59 


The factor 7.5 is the sum of squares of the coefficients in any du, and 
g is the number of differences which the sample provides (g is approxi- 
mately 7/9). In the natural populations which Yates examined, a 
formula of this type was superior to the formula s,,9? based on suc- 
cessive differences, but it still overestimated the actual variance of Jay. 

In conclusion, there is no dearth of formulas for the estimated vari- 
ance, but all appear to have a limited range of applicability. 


8.13 Stratified systematic sampling. We have seen that systematic 
sampling is itself a kind of stratified sampling. In some applications 
in which stratification seems desirable, it can be introduced by the 
use of a systematic sample. Consider a sample of the blocks in a 
large city. Usually it is desirable to ensure that this sample has good 
geographic coverage in which the different types of residential area 
are all represented. If the blocks are numbered serially, proceeding 
from one side of the city to the other in a serpentine fashion, a syste- 
matic sample will often give adequate geographic stratification. 

The situation is different when the strata are distinct, so that one 
stratum does not merge into another, and when separate estimates 
are required for each stratum. Here we may take a separate syste- 
matic sample in each stratum, with a new starting point and perhaps 
a different value of k. This method will be more precise than stratified 
random sampling if systematic sampling within strata is more pre- 
cise than simple random sampling within strata. 

If Fsm is the mean of the systematic sample in stratum h, the esti- 
mate of the population mean Y is, as usual, 


e D Neuh 


Ystsy = - 


From theorem 5.2, assuming that the estimate is unbiased in each 


stratum, 
BOAN E NKV Gaya) 
(Getsy) = amas 


With only a few strata, the problem of finding a sample estimate of 
this quantity amounts to that of finding a satisfactory sample esti- 
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mate of V (Fsyn) in each stratum, by one of the methods discussed in 
the previous section. 

When the number of strata exceeds 20, an estimate based on the 
differences between pairs of strata may be preferable. The stratum 
sizes and the sample sizesjshould be approximately the same for the 
members of a pair. If the first two strata form a pair, then 


E@eu — Goya)” = Vou) + V Gove) + (Y1 — Yo)? 


Consequently, the estimate 
$ Li ya, s A2 
vGotsy) = we Do! Ni? Gun — Favs) 


where the sum extends over the pairs of strata, is on the average an 
overestimate, even if periodic effects are present within strata. The 
amount of overestimation depends on the terms in (Ya — Yj)”. So 
far as can be predicted, strata in the same pair should therefore have 
about the same population means. This device is an application of 
the method of “collapsed strata,” previously described in section 5.21. 


8.14 Systematic sampling in two dimensions. Some sampling prob- 
lems that are two-dimensional are handled by numbering the units so 
that a one-dimensional systematic sample can be taken. The sample 
of city blocks mentioned in the previous section is an instance. An- 
other is a method commonly employed in forestry surveys. A series 
of equidistant parallel strips is mapped out, extending the whole 
Width of the forest. The volume of timber in each strip is estimated, 
Sometimes by measuring all trees in the strip but more frequently by 
measuring a sample of the trees. If y; is the total volume of timber in 
the ¿th strip, one-dimensional theory may be applied. When the 
forest varies in width, a natural modification is to regard the area 2; 
of the ¿th strip as an auxiliary variate, and employ a ratio or regres- 
sion estimate. 

Two forms of systematic sample for a square area are shown in 
figure 8.4. The sample on the left, which resembles a square grid, is 
completely determined by the choice of a pair of random numbers to 
fix the coordinates of the upper left unit. The sample on the right is 
also systematic because the distance, horizontal or vertical, between 
units in successive strata is always the same. However, unlike figure 
8.4a, the units do not lie on the same line. To select an unaligned 
sample of this kind, we fix the coordinates of the upper left unit by a 
pair of random numbers. Two additional random numbers determine 
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the horizontal coordinates of the remaining units in the first column of 
strata. Another two are needed to fix the vertical coordinates of the 
remaining units in the first row of strata. 


(a) Aligned or “square grid” (6) Unaligned sample 
sample 


Figure 8.4 Two types of two-dimensional systematic sample. 


Quenouille (1949) and Das (1950) have compared two-dimensional 
systematic samples with various types of stratified sampling in theo- 
retical studies for some simple two-dimensional correlograms. The re- 
sults indicate that the wnaligned systematic sample will often be su- 
perior to stratified random sampling. 

Quenouille’s analysis suggests that the square grid is not so precise 
as the unaligned sample. This suggestion is supported by a study on 
natural populations by Haynes (1948). In fourteen agricultural uni- 
formity trials, he found that the grid had only about the same pre- 
cision as simple random sampling in two dimensions. The relatively 
poor performance of the square grid is not unexpected when we con- 
sider the effect of linear gradients. If there is a pronounced linear 
gradient parallel to the horizontal side of the area, the square grid in 
figure 8.4a samples this gradient at only three points; it seems intui- 
tive that a method which samples the gradient at nine different points, 
as does the unaligned sample, will be superior. 

Further evidence for the superiority of an unaligned sample is ob- 
tained from experience in experimental design, where the latin square 
has been found a precise method for arranging treatments in a rec- 
tangular field. The 5x5 latin square in figure 8.5a may be regarded 
as a division of the field into five systematic samples, one for each 
letter. There is some evidence that this particular square, which is 
called the “knight’s move” latin square, is slightly more precise than 
a randomly chosen 5 x 5 square, probably because alignment is absent 
in the diagonals as well as in rows and columns. 
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The principle of the latin square has been used by Homeyer and 
Black (1946) in sampling rectangular fields of oats. Each field con- 
tained 21 plots. The three possible systematic samples are denoted 
by the letters A, B, and C, respectively, in figure 8.5b. This arrange- 


ABCDE å BC 
DEA BE BCA 
BODEA C A B 
EABCOD A BS 
CDEAB BGA 

C AB 

A B ¢ 

(a) “Knight's move” latin square (b) Systematic design for a 3 X 7 rec- 


tangular field 


Fiaure 8.5 Two systematic designs based on the latin square. 


ment, with one of the letters chosen at random in each field, gave an 
increase in precision of around 25 per cent over stratified random sam- 
pling with rows as strata. The arrangement does not quite satisfy 
the latin square property, because each letter appears 3 times in one 
column and twice in the other columns, but it approaches this prop- 


erty as nearly as possible. 


8.15 Summary. Systematic samples are convenient to draw and to 
execute. In most of the studies reported in this chapter, both on arti- 
ficial and on natural populations, they compared favorably in pre- 
cision with stratified random samples. Their disadvantages are that 
they may give poor precision when unsuspected periodicity is present 
and that no trustworthy method for estimating V (gs) from the sample 
data is known. 

In the light of these results, writers on sampling are not in agree- 
ment in their views on the advisability of systematic sampling. It 


appears, however, that systematic sampling can safely be recommended 


in the following situations: 

i. Where the ordering of the population is essentially random, or 
contains at most a mild stratification. Here systematic sampling is 
used for convenience, with little expectation of a gain in precision. 
Sample estimates of error which are reasonably unbiased are available 
(section 8.12). ~ 

ii. Where a stratification with numerous strata is employed, and 
matic sample is drawn from each stratum. The 
effects of any hidden periodicities tend to cancel out in this situation, 
and an estimate of error which is known to be an overestimate can be 
obtained (section 8.13). ‘Alternatively we can use half the number of 


an independent syste 
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strata and draw two systematic samples, with independent random 
starts, from each stratum. This method gives an unbiased estimate 
of error. 

iii. For subsampling the units (chapter 10). In this case it turns 
out that an unbiased estimate of the sampling error can be obtained 
in most practical situations. 

iv. For sampling populations with variation of a continuous type, 
provided that an estimate of the sampling error is not regularly re- 
quired. If a series of surveys of this type is being made, an occasional 
check on the sampling errors may be sufficient, Yates (1948) has 
shown how this may be done by taking supplementary observations. 

In conclusion, further research thay extend our knowledge of the 
validity and range of application of formulas which purport to esti- 
mate V (Fay) and may lead to improved formulas. 


8.16 Exercises. 


8.1 . The data below are the numbers of seedlings for each foot of bed in a 
bed 200 ft long. 


1-20 21- | 41- | 61- | 81- | 101- 121-| 141- | 161- | 181- Systematic 
40 | 60 | 80 | 100 | 120 140 | 160 | 180 | 200 sample 
1} 2 3 4 5 6 F. 8 9 10 totals 


Y 
Š 
m 
© 
8 
o 
a 
» 
s 
w 
È 
w 
ò 
m 
e o 
= 


165 

11| 22| 25) 14] 11] 43 15| 16| 16 4 177 
16 | 26| 39| 24 9| 27| 14| 18| 2% 9 202 
Zala 17] 24] 18] 25] 20 13| 11 6 8 149 
22) 39| 25| 17] 16] 21 9| 19] 15 8 191 
44| 21| 18| 14] 13] 18] 95 27 4 9 193 
26 | 14| 44| 38| 22| 19| 17 29 8| 10 227 
31| 40| 55| 36] 18] 24 TA E-S 8 5 255 
26 | 30| 39| 29 9/ 30] 30] 29] 10 3 235 


Strata 
totals 410 | 459 | 674 | 551 | 325 528 | 303 | 358 | 342 


8 
a 
ry 
ta 
on 
a 
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Find the variance of the mean of a systematic sample consisting of every 
20th foot. Compare this with the variances for (i) a simple random sample, 
(ii) a stratified random sample with 2 units per stratum, (iii) a stratified ran- 
dom sample with 1 unit per stratum. All samples haven = 10. [X (y; — F)? 
= 23,601.] 

8.2 For the population in exercise 8.1, is the precision of systematic sam- 
pling improved by end corrections? 

8.3 A two-dimensional population with a linear trend may be represented 
by the relation 

ys=ttj @j=1,2,---, nk) 


where y;; is the item value in the ith row and jth column. The population 
contains N? = n°k* units. 

A systematic square grid sample is selected by drawing at random two in- 
dependent starting coordinates to, jo, each between 1 and k. The sample, of 
size n®, contains all units whose coordinates are of the form 


to + Yk, jo + ôk 


where y, ô are any two integers between 0 and (n — 1), inclusive. 

Show that the mean of this sample has the same precision as the mean of a 
simple random sample of size n*. 

8.4 If the comparison in exercise 8.3 were made for a three-dimensional 
population with linear trend, what result would you expect? 

8.5 A population of 360 households (numbered from 1 to 360) in Baltimore 
is arranged alphabetically in a file by the surname of the head of the house- 
hold. Households in which the head is non-white occur at the following 
numbers: 28, 31-33, 36-41, 44, 45, 47, 55, 56, 58, 68, 69, 82, 83, 85, 86, 89-94, 
98, 99, 101, 107-110, 114, 154, 156, 178, 223, 224, 296, 298-300, 302-304, 
306-328, 325-331, 333, 335-339, 341, 342. (The non-white households show 
some “clumping” because of an association between surname and color.) 

Compare the precision of a 1 in 8 systematic sample with a simple random 
sample of the same size, for estimating the proportion of households in which 
the head is non-white. 
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CHAPTER 9 
TYPE OF SAMPLING UNIT 


9.1 The optimum unit. Sometimes a population ean be divided into 
units in various ways. A city may be regarded as composed of a num- 
ber of city blocks, or of a number of households, or of a number of 
persons. In soil sampling, the tool with which the sample is extracted 
can be constructed of various sizes and shapes, each of which creates 
a different subdivision of a field into sampling units. A change in the 
type of unit usually affects both the cost of taking the sample and the 
precision obtained from it. The determination of the optimum type 
of unit may therefore be important in the economics of sampling. 

The optimum unit is that which gives the desired precision for the 
sample estimates at the smallest cost, or the greatest precision for 
fixed cost. For a given size of sample, a large unit is nearly always less 
expensive than a small unit, but it is often less precise, The choice of 
unit involves striking a balance between relative precision and rela- 
tive cost. As in most practical decisions, there may be imponderable 
factors: one type of unit may have some special convenience or disad- 
vantage that is difficult to include in a calculation of costs. In sam- 
pling a growing crop, some experiences suggest that a small unit gives 
biased estimates because of uncertainty about the exact boundaries of 
the unit. For example, Homeyer and Black (1946) found that units 
2 x 2 ft gave yields of oats about 8 per cent higher than units 3 x 3 ft, 
possibly because samplers tend to place boundary plants inside the 
unit when there is doubt. Sukhatme (1947) cites similar results for 
wheat and rice. 


9.2 A simple example. Johnson’s data (1941) for a bed of white pine 
seedlings provide a simple example of the procedure for comparing 
different units. The bed contained 6 rows, each 434 ft long. There 
are many ways in which the bed can be divided into sampling units. 
Data for four types of unit are shown in table 9.1. Since the bed was 


completely counted, the data are correct population values. 
189 
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TABLE 9.1 DATA FOR FOUR TYPES OF SAMPLING UNIT 


Type of unit 
Preliminary data 
1-ft 2-ft 1-ft 2-ft 
row row bed bed 
Relative size of unit 1 2 6 12 
Nu = number of units in pop. 2604 1302 434 217 
Su? = pop. variance per unit 2.537 6.746 23.094 68.558 
Number of feet of row that can 
be counted in 15 min 44 62 78 108 


The units were: 

One foot of a single row. 

Two feet of a single row. 

One foot of the width of the bed. 

Two feet of the width of the bed. 

With the first two units, it was assumed that sampling would be 
stratified by rows, so that the+S,? represent variances within rows. 
Simple random sampling was assumed for the last two units. 

Since the principal cost is that of locating and counting the units, 
costs were estimated by a time study (last row of table 9.1). With 
the larger units, a greater bulk of sample can be counted in 15 min, 
less time being spent in moving from one unit to another, 

The item to be estimated is the population total number of seedlings. 
In studies of this type, a population total is more convenient to discuss 
than a population mean, since the mean per unit for a 2-ft bed unit is 
quite a different quantity from the mean per unit for a 1-ft row unit, 
whereas the population total is the same quantity for all units. If the 
fpe is ignored, the variance of the estimated population total is 


N78? 


Nu 


where u = 1, 2, 3, 4 stands for the type of unit. This variance is to be 
the same for all units. If the smallest unit is chosen as a standard, the 


values of the other n, that give the same precision as the smallest unit 
are obtained from the equation 


NES . Nene S 8,2 
= 7 Le. nu = ny | —) — 
ny Ny Nı/ 8,? 


+ 
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For example, the value of ng comparable to 7; is 


6 6.746 oos 
(NES 
na TANG) 91537 m 


These data appear in table 9.2, first line. 


TABLE 9.2 COMPARABLE SAMPLE SIZES AND COSTS 


Type of unit 
Successive steps in 

the calculation 1-ft 2-ft 1ft oft 

row row bed bed 
» Comparable values of nu nı 0.665n, 0.253nı 0.188nı 

Comparable sample sizes 

(in 1-ft row units) mı 1.330n, 1.5l8nı 2.25őnı 
Comparable costs ĉi 0.944c1 0.856c: 0.919cı 

Relative net precision 100 106 117 109 


The next step is to find the comparable sample sizes in terms of 
single feet of row, since costs are expressed in these terms. For ng we 
multiply the previous line by 2, because the unit contains 2 ft of row 
(second line of table 9.2). As the size of the unit increases, the size of 
sample required to obtain equal precision also increases: in fact, with 
the 2-ft bed unit, the sample must be 2} times as large as with the 


1-ft row unit. 
The cost of taking nı of the smallest units may be expressed as 


since this is time required in 15-min intervals. Similarly the cost of 
the sample with the second unit is 
1.33071 
62 
as shown in table 9.2. All the larger units cost less than the smallest, 
unit, although the differences are not great. The 1-ft bed unit appears 


the best. The last line of table 9.2 shows the reciprocals of the costs, 
with the smallest unit taken as 100. In the table these figures have 


44 
= 1.330 5) cı = 0.944c, 
62 
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been called relative net precision, because, if the same comparisons are 
made with cost kept constant instead of variance, it will be found that 
these figures are inversely proportional to the relative variances of the 
estimated population total, and hence measure relative precision. 


9.3 General procedure for comparing units. The analysis in this ex- 
ample can be expressed in more general terms as follows. 


Theorem 9.1 For the uth type of unit, let 


Relative size of unit =M, 
Variance among the item totals on the unit = S,? 
Relative cost per unit = Ou 


Simple random sampling is assumed, with the fpe ignored, and the 
population total is estimated by simple expansion. Then the relative 
cost for equal precision is proportional to 


CaS? 
Me 0D 
u 


Proof: This follows the argument used in the numerical example. 
For the uth unit ` 


Number of units in the population « 1/M,„ 
Variance of estimated population total œ §,?/n,M,2 
Sample size (nų) for equal precision œ 8.2/M,2 
Relative cost for equal precision « C,S8,?/M,? 


The definitions of §,? and C, should be noted 
pilation of data it is often convenient to express these quantities 
originally in some other form. Thus in the numerical example the 
cost data were given in terms of the bulk of sample that could be 
counted in a given time. 

Corollary 1 Under the conditions of the theorem, the variance of 
the estimated population total, for a fixed cost, is also proportional to 


CS? 
ME a 


This follows by the same argument. 

Corollary 2 In the analysis of variance, variances for units of dif- 
ferent sizes are often computed on what is called a common basis. 
The variance S,” among totals of units of size M, is divided by Mu. 
Suppose, for example, that we wished to present all the variances in 
table 9.1 in terms of the variance per 1 ft of row. Since the 2-ft row 


, because in the com- 
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unit is the total of 2 single feet, the variance 6.746 is divided by 2, 

giving 3.373. Similarly the variances for the third and fourth units 

are divided by 6 and 12, respectively. The results are as follows: 
COMPARABLE VARIANCES PER SINGLE FOOT OF ROW 


Type of unit 


= 
1-ft 2-ft 1-ft 2-ft 
row row bed bed 


2.537 3.373 3.849 5.713 


When placed on a common basis, the variances still increase steadily 
with the increasing size of unit. 

Let Su’? = S.2/M, so that the quantities S,/? are on a common 
basis. Also let C,,’ be the cost of taking a given bulk of sample, so 
that Cu’ œ C,,/M,. Then theorem 9.1 may be stated as follows: 


Relative net cost for equal precision « C,,'S,/? 
Relative net precision ol /C,'8,," 


This result shows that, if we are ignoring differences in the costs of 
taking the sample (i.e. assuming Cu’ constant), relative net precision 
is inversely proportional to Su”. In other words, in order to compare 
different units for the same total bulk of sample, the relevant quanti- 
ties are the variances among units, reduced to a common basis. 

The results of theorem 9.1 and its corollaries remain valid for 
stratified sampling with proportional allocation, if all strata are of the 
same size and if 8,2, Su” represent average variances within strata. 
This is so because, under the conditions stated, the variance of the 
estimated population total, ignoring the fpe, is N*S,?/n, and there- 
fore assumes the same form as with simple random sampling. Theo- 
rem 9.1 does not hold for more complex types of sampling. 

The preceding results are intended merely as an illustration of the 
general procedure. Comparisons among units should always be made 
for the kind of sampling that is to be used in practice, or if this has not 
been decided, for the kinds that are under consideration. Changes in the 
method of sampling or of estimation will change the relative net pre- 
cisions of the different units. Even with a fixed method of sampling 
and estimation, relative net precisions will vary with size of sample 
if the cost is not a linear function of size or if the size is large enough 
so that the fpe must be taken into account. 

There is usually more than one item to consider. One approach is 
to fix the total cost, and work out the relative net precisions for each 
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type of unit and each item. Unless one type of unit is uniformly su- 
perior, some compromise decision is made, giving principal weight to 
the most important items. 

Tn view of the numerous factors which influence the results, a study 
of optimum size of unit in an extensive survey is a large task. A good 
example for farm sampling is described by Jessen (1942). An excerpt 
from his results is given in table 9.3. This compares 4 sizes of unit—a 
quarter-section, a half-section, a section, and a block consisting of 2 
contiguous sections. The section is an area 1 mile square, containing 
on the average slightly under 4 farms. In this comparison the total 
field cost ($1000), the length of questionnaire (60 min to complete), 
and the travel cost (5 cents per mile) are all specified, because relative 


net precisions change if any of these variables is altered. Costs are 
at a 1939 level. 


TABLE 9.3 ESTIMATED STANDARD ERRORS (IN PER CENT) FOR FOUR SIZES OF 
UNIT, WITH SIMPLE RANDOM SAMPLING 


Best 
Items 5/4 8/2 S 28 unit 
Number of swine 5.0 4.9 5.3 6.2 S/2 
Number of horses 3.4 3.3 3.6 4.2 S/2 
Number of sheep 17.4 15.7 14.9 14.3 28 
Number of chickens 3.0 3.0 ass 3.8 8/4, 8/2 
Number of eggs yesterday 5.7 5.2 4.9 4.7 28 
Number of cattle 4.7 4.6 4.8 5.5 8/2 
Number of cows milked 3.7 3.6 3.8 4.4 8/2 
Number of gallons of milk 4.4 4.2 4.4 4.9 S/2 
Dairy products receipts 5.5 5.2 5.4 6.0 8/2 
Number of farm acres 2.9 2.8 3.0 3.5 8/2 
Number of corn acres 3.7 3.5 3.8 4.4 S/2 
Number of oat acres 4.6 4.8 5.6 7.0 S/4 
Corn yield 1.6 I% 2.0 2.5 S/4 
Oat yield 1.6 1.5 1.6 1.8 5/2 
Commercial feed expenditures 12.6 13.6 16.7 21.8 S/4 
Total expenditures, operator 7.8 8.1 9.6 12.0 S/4 
Total receipts, operator 6.2 6.5 VEY 9.8 S/4 
Net cash income, operator 6.8 6.9 7.8 9.5 S/4 


‘The data in the table are the relative standard errors (in per cent) 
of the estimated means per farm for 18 items. No unit is best for all 
items. The half-section and the quarter-section are, however, superior 
to the larger units for all except 2 items, with little to choose between 
the half- and quarter-sections. The half-section would probably be 
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preferred, because the problem of identifying the boundaries accu- 
rately is easier. 

In order to make any comparison of this kind, we must know the 
variability among units for each type of unit that is included. How 
are such data obtained? One source, as in the nursery bed example, 
is a complete count of the population for each type of unit. Another 
is the drawing of separate samples for each type of unit. Such methods 
may be employed when the population is compact and it is not too 
costly to obtain the data. With extensive populations, however, it is 
seldom feasible to make a survey solely for the purpose of comparing 
different types of unit. Information about optimum type of unit is 
more usually procured as an ingenious by-product of a survey whose 
main purpose is to make estimates. Some techniques for doing this 
are outlined in succeeding sections. 


9.4 Comparisons made from survey data. Suppose that in a survey 
each unit can be divided into M smaller units. Instead of recording 
only the totals for each “large” unit in the sample, we record data 
separately for each of the M small units. A comparison can then be 
made of the precision of the large and small units. A simple random 
sample of size n will be assumed at first. 

The analysis of variance in table 9.4 can be computed from the 


sample, 
TABLE 9.4 ANALYSIS OF VARIANCE OF THE SAMPLE DATA 
(ON A SMALL-UNIT BASIS) 
df ms 


Between large units (m — 1) 
Between small units within large n(M — 1) Sw 


o (m= D + n(M — 1)su 
Between small units in sample (nM — 1) $ = — a 
The estimated variance of a large unit (on a small-unit basis) is sp”. 


It might be thought that an appropriate estimate of the variance of a 
small unit would be the mean square between all small units in the 


sample, i.e. p PET 
fee (n — 1)s + n(M = 1) sw (9.3) 


i (nM — 1) 
This estimate, although in many cases satisfactory, is biased, because 
the sample is not a simple random sample of small units, since these 
are sampled in contiguous groups of M units. 
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An unbiased estimate is obtained from the sample by constructing 
an analysis of variance, as in table 9.5, for the whole population, which 
contains N large units and NM small units. 


TABLE 9.5 ANALYSIS OF VARIANCE FOR THE WHOLE POPULATION 
(ON A SMALL-UNIT BASIS) 


df ms 
Between large units (N — 1) Si? 
Between small units within large 3 
units N(M — 1) Bo 
pon small units in the popu- Wa- $= WN- DS? + N(M = DSa” 
ation (NM — 1) 


By its definition, the population variance among small units is given 
by the last line of the table, i.e. 


ge - N= DSP + NM — 18,2 
= (NM — 1) 

With simple random sampling, sy? in table 9.4 is 

of S4? (this follows from section 2.3). 


Sw” is an unbiased estimate of S,,2, 
variance §? among all small units in 


an unbiased estimate 
It may be shown easily that 
Hence an unbiased estimate of the 
the population is 


ge a N= Vs? + NM — 1)s,2 (9.4) 
(NM — 1) ; 


If n exceeds 50, estimates (9.3) and (9.4) are 
since both reduce approximately to 


practically identical, 


PN ._ 3 + (M — 1)s,2 
KA = EO Deel (9.5) 


The two estimates, s,2 (for the ori 
unit), are then inserted into the appr 
estimated population total. 

Tf the sample is large, the small units 
subsample of the large units (say 100 o 
small units, chosen at random from eac 
More than one size of small unit may 
provided that we take data which gi 
for each small unit. 

With stratified sampling, the variances for ¢ 
can be estimated by these methods separatel 


ginal unit) and §? (for the small 
opriate variance formulas for the 


may be measured for a random 
ut of 600). Alternatively, two 
h large unit, might be measured. 
be investigated simultaneously, 
ve an unbiased estimate of S,, 


he large and small units 
y in each stratum, and 
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then substituted in the appropriate formula for the variance of the 
estimate from a stratified sample. 

Example. The data come from a farm sample taken in North Caro- 
lina in 1942 in order to estimate farm employment (Finkner, Morgan, 
and Monroe, 1943). The method of drawing the sample was to locate 
points at random on the map and choose as sampling units the three 
farms that were nearest to each point. This method is not recom- 
mended, because a large farm has a greater chance of inclusion in the 
sample than a small farm and an isolated farm has a greater chance 
than a farm in a densely farmed area. Any effects of this bias will be 
ignored in the present illustration. 

The sample was stratified, the stratum being a group of townships 
that were similar in density of farm population and in ratio of crop- 
land to farmland. Some data for the sample taken in May are shown 
in table 9.6. 


TABLE 9.6 SIZES OF POPULATION AND SAMPLE 


Population Sample 
No. of strata 587 572 
No. of sampling units 72,849 1397 
No. of farms 217,976 4165 


It will be noted that a few strata were not sampled and that the 
number of farms per unit was slightly under 3. These discrepancies 
will be ignored. The sampling ratio was 1.9 per cent. 

From the sample data we can compare the group of 3 farms with the 
individual farm as sampling unit. We shall use a slightly simpler 
analysis than is strictly required. The fpc can be omitted. Since the 
sampling was stratified, the variance of the estimated population total is 


N,28;2 
VP) =E —— 


h Nh 


The standard procedure is to compute, within each stratum, an esti- 
mate of S}? for the two types of unit, and substitute in this formula. 
The strata contained in general between 300 and 450 farms, and either 
two or three 3-farm units were taken in each stratum so as to make the 
sampling approximately proportional. Assuming proportionality, i.e. 


na/Nn = n/N, we may write 


N N? 
Va) =— Do NS “=. y” 
n 


if we assume further that the S}? do not vary greatly among strata, so 
that they may be replaced by their average, S+”. 
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Estimates of 54° are obtained from the analysis of variance in table 
9.7, which is on a single-farm basis. 


TABLE 9.7 SAMPLE ANALYSIS OF VARIANCE (NUMBER OF PAID WORKERS) 
(SINGLE-FARM BASIS) 


df ms 
Between units within strata 825 6.218 
Between farms within units 2768 2.918 
Between farms within strata 3593 3.676 


For the group of 3 farms, the mean square 5,32 = 6.218 serves as 
the estimate of 5,2. To obtain an estimate of the variance within 
strata for the individual farm, we construct an analysis of variance 
for the whole population (table 9.8): The degrees of freedom come 
from table 9.6, and the first two mean squares from table 9.7. 


TABLE 9.8 CONSTRUCTED ANALYSIS OF VARIANCE FOR THE WHOLE POPULATION 


Estimated 
df ms 
Between units within strata 72,262 6.218 
Between farms within units 145,127 2.918 


Between farms within strata 217,389 
The estimated variance between farms within strata is then com- 
puted as 
ris (72,262) (6.218) + (145,127) (2.918) 
217,389 Sed 


Since the estimated variances, 6.218 for the group of 3 farms and 
4.015 for the individual farm, are on a common b 


The group of farms gives only 
Single farm. Consideration of 


costs would presumably make the result more favorable to the 3-farm 


unit. 


9.5 Variance functions. Suppose that in the preceding example we 
wished to compare the precision of the 3-farm unit with that of a unit 
consisting of 2 or 6 or 10 farms. We would require some method of 
predicting the variance S$,” between units in the population as a func- 
tion of M, the size of the unit. By the analysis of variance, §,? can 
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be found if we know (i) the variance S? between all elements in the 
population and (ii) the variance Sw” between elements that lie in the 
same unit. In the method to be presented here, the approach is to 
predict S,,2 and S? and to find S}? by the analysis of variance. 

The sample data produce estimates of S? and S,,” for the size of unit 
actually used. Since S? is the variance among elements, it is not 
affected by the size of the unit. However, Su” will be affected. It 
might be expected to increase as the size of the large unit increases. If 
the large units which are to be examined differ little in size from the 
unit actually used, a first approximation is to regard Sw” as constant, 
using the estimate given by the sample data. An investigation by 
MeVay (1947) suggests that this approximation may often be satis- 
factory. 

As a better approximation, attempts have been made (Jessen, 1942; 
Mahalanobis, 1944; Hendricks, 1944) to develop a general law which 
predicts how S»? changes with the size of unit. In several agricultural 
surveys, Su? appeared to be related to M by the empirical formula 


S.2=AM (g>0) (9.6) 


where A and g are constants that do not depend on M. In this formula 
S,,? increases steadily as M increases. Usually g is small. A curve of 
this type might be expected when there are forces that exert a similar 
influence on elements that are close together. Climate, soil type, 
topography, and access to markets tend to make neighboring farms 
have similar features. 

Theoretically, the formula is open to objection, since it makes Su 
increase without bound as M increases. If we assume, as seems reason- 
able, that there is no correlation between elements that are very far 
apart, a formula in which Sw approaches an upper bound with large 
M would be more appropriate. However, any formula will suffice if 
it gives a good fit over the range of M that is under investigation. 

If this formula fits, log Sw” should plot as a straight line against 
log M. Values of Sw” for at least two values of M are needed in order 
to estimate the constants log A and g. At least three values of M are 
necessary for any appraisal of the linearity of the fit. 

From the analysis of variance in table 9.5 we find 


7 AS 1)S? — N(M — 1)8.u" 
` aii N =i 
(NM — 1)S? — N(M — 1)AM* 
T N-1 


(9.7) 
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If N, the number of large units,* is large, this takes the simpler form 
S? = MS? — (Mm — 1)AM* (9.8) 


Hendricks (1944) has pointed out that the complete population 
might be regarded as a single large sampling unit containing NM ele- 
ments. If formula (9.6) holds, then S2 = A(NM)*. The advantage 
of this device is that the values of A and g can now be estimated from 
the data for a survey in which only one value of M was used. The 
two equations which lead to the estimates are 


log Sw? = log A + g log Af 


log S? = log A + g log (NM) 
The formula for Sè becomes, from (9.7), 
"ma AM*{(NM — 1)Ne — N(M — 1)} 
N-1 

This method furnishes no check on the correctness of formula (9.6). 
It might happen that the formula held well enough for small values of 
M, but failed for a value as large as NM. In this event the more 
general formulas (9.7) and (9.8) should be employed. 

Formula (9.6) is presented as an example o 
than as a general law. The reader who faces 
construct and test whatever type of formula 
to his material. In some cases log §,? might b 


f the methodology rather 
a similar problem should 
Seems most appropriate 
e a simple function of M. 


The second component, c2+/n, measures the cost of travel between 
the clusters. Tests on a map showed that this cost, for a fixed popula- 


tion, varies approximately as the Square root of the number of clusters. 
Total field cost is therefore 


C = Mn + oVn (9.9) 


* In the references cited, N is usually defined as the number of elements in the 
population, so that formulas (9.7) and (9.8) have a different appearance, 
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Assuming simple random sampling and ignoring the fpc, the variance 
of the mean per element 7 is S;°/nJZ. From (9.8), this equals 


_ $- (M-AM 


V@) = (9.10) 


n 

To determine the optimum size of unit, we find M, and incidentally 
n, so as to minimize V for fixed C. The general solution is complicated, 
although its application in a numerical problem presents no great 


difficulty. 
By some manipulation we can obtain the equation which gives the 


optimum M. First solve the cost equation (9.9) as a quadratic in Vn. 
This gives 


(9.11) 


Ce” 


2M Vn 4Co MN * 
n A =i 


C2 
The equation to be minimized is 
CHAV = &Mn + 2Vn +V 


Differentiating, and noting that dV/dn = —V/n, we obtain the 
equations 


n: aM + $on” = — — = (9.12) 
on n 
Aay 
M: = — — 9.13 
Cn 3M ( ) 


Divide (9.13) by (9.12) so as to eliminate à. This leads to 


n ðV can 
Vom cM + en” 


or 
1 
MOV p (9.14) 
V om C2 
2e:M4/n 
If we substitute for +/n from (9.11), we obtain, after some simplifica- 
tion, 
4Co M\ T” 
MY (i4 2 ) =i (9.15) 
VoM om 
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By writing out the left side of this equation in full and changing 
signs on both sides, we find 


2 
C2 


AM— {gM — (g — 1)} A (: a e 
& — (M—1)AMe T 


This equation gives the optimum M. The left side does not involve 
any of the cost factors, being dependent only on the shape of the vari- 
ance function. It follows that for a given population, which will have 
some fixed variance function, the optimum M reacts to changes in the 
cost factors in such d way that the quantity 


CoM /co? 
remains fixed. 
Now c, increases if the length of interview increases, while cy de- 
creases if travel becomes cheaper, or if the farms in a given area be- 


come more dense. ‘These facts lead to the conclusion that the optimum 
size of unit becomes smaller when: 


Length of interview increases, 

Travel becomes cheaper. 

The elements (farms) become more dense, 

Total amount of money used (C) increases, 

The conclusions are a con 
would require re-examination with g 
lustrate the fact that the opti 
the population, but depends als 
levels of prices and Wages, 


9.7 Variance in terms of intra-cluster correlation. When the sam- 
pling unit is a cluster of M elements, variance formulas are sometimes 
expressed in terms of the correlation coefficient, p between elements in 
the same cluster. An example of this approach has already been given 
for systematic sampling (section 8.3). 

Let y;; be the observed value for the jth element within the ith unit, 
and let y; be the unit total. In cluster sampling we need to distinguish 
between two kinds of average: the mean per unit Y = ys y:i/N, and 


the mean per element Y = D v:/NM = Y/M. 


The variance among 
elements is 


2 (uss =Y) 


he pe E 
NM — 1 
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The intra-cluster correlation coefficient, p, may be defined as 
XD ws — Yus =) 
i j<k 

NM(M — 1)S?/2 


The numerical term in the divisor is the number of cross-products 
NM(M — 1)/2 in the sum in the numerator. 

Consider the variance of the estimated mean per element 7. With a 
simple random sample of n complete clusters, 


(N — 1) (yi — Y)? 
Nn? (N — 1) 


p 


ma 1 The 
VO) = 7a V0 = (9.16) 


since 7 = 9/M. But 
(ys — F) = (wa — F) + ee — F) +-+ (yaar — Y) 
Square and sum over all N clusters, 
Du- PPD E u -PEHE F ws — Dwa- M 
i i i 5. 
= (NM — 1)S? + NM(M — 1)pS? 


Substitute in equation (9.16) for V (7): 
Qa al s pi —1) N(M- e) 


u= aM” lea AN 
For N large, this reduces to 
yy = S BE 91 + cat — 19) (9.17) 
N nM 


If a simple random sample of nM elements is taken, the formula for 
V(%) is the same as (9.17) except for the term in braces. The factor 


1+ (M — 1)p 


shows by how much the variance is changed by the use of a cluster 
instead of an element as sampling unit. If p > 0, the cluster is less 
precise for a given bulk of sample. If p < 0, as sometimes happens, 
the cluster is more precise. 


9.8 Cluster sampling for proportions. The same techniques apply to 
cluster sampling for proportions. Suppose that the M elements in any 
cluster can be classified into two classes, and that p: = a;/M is the 
proportion in class C in the dth cluster. A simple random sample of n 
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clusters is taken, and the average p of the observed p; in the sample is 
used as the estimate of the population proportion P. 

It will be recalled (section 3.2) that we cannot use binomial theory 
to find V(p), but must apply the formula for continuous variates to 
the p: This gives X 

D 


W =n) 2 A á 
me Na yn O 


V(p) = 


Alternatively, if we take a simple random sample of nM elements, the 
variance of p is obtained by binomial theory (theorem 3.2) as 


(NM — nM) PQ. (N—n) PQ 
Voin(p) = =, 
NM —1 nM N nM 
if N is large. Consequently, the factor 
Vo). MY (p: — Py? r 
ZAS =g mey (N large) (9.18) 


shows the relative change in the variance due to the e of clusters. 
Numerical values of this factor are h 


estimates of sample size with cluster sa: 
size is first. estimated by the binomial formula, and then multiplied 
by the factor to indicate the size that will be necessary with cluster 
sampling. For an illustration, see Cornfield (1951). 

If the cluster sizes M; are variable, the estimate p = Dai/ M: 


is a ratio estimate. Its variance is given approximately by the formula 
(section 6.9) 


elpful in making preliminary 
mpling. The required sample 


N 
(N — n) 2 ee ipe FY 
` NaM? N=i 


where M = >> M;/N is the average size of cluster, 
If this sample is compared with a simple 
elements, we fiñd, as a generalization of (9.18), 


V(p) = È MP(p; — Py? 
Voin) NMPQ (8:19) 


As with continuous variates, the relationship of size of cluster to 
between-cluster variance can be investigated, eithe i 
A r by expressing the 
factor in equations (9.18) and (9.19) as a function of M rs by a hig? 
a relation between the within-cluster variance on, TE weassian ; 


V(p) ‘= 


random sample of nM 
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the value 1 to any unit which falls in class C, and 0 to any other unit, 
the fundamental analysis of variance equation for fixed M is 


NMP -P)= ME (@:- P? + MI pl- pi) 
Totalss = ss between clusters + ss within clusters 


From this relation the mean square within clusters can be computed 
and plotted as a function of M. McVay (1947) describes how this 
analysis can be used to investigate optimum cluster size. 


9.9 Measures of the size of a unit. In many surveys the units vary 
in size. A house or dwelling place, suitably defined, is often a con- 
venient unit in surveys of human populations, but it may contain 
anywhere between 0 and 25 or more persons. In examples of this 
kind, we can define the size of the cluster as the number M; of ele- 
ments which it contains. 

There are, other populations in which obvious differences in size 
among units exist, although it is less clear how size is to be measured. 
Farms, banks, and restaurants are examples. There are large and 
small farms. As a measure of the size of a farm, however, we might 
propose the total acreage, or the total acreage available for crops, or 
the total value of the farm’s production, or still other quantities. 

What kind of measure of size is useful to the sampler? Suppose that 
some item y; is to be measured on each of a simple random sample of 
farms. The sampler fears that y; will have a high variance, because 
there are some farms which year after year give large values of yi 
whereas others consistently yield small values. What is needed is an 
auxiliary variable x; that is obtainable before the survey is taken and 
that predicts whether the value of y; will be large or small. Thus the 
problem of finding a measure of size reduces to that of finding an 
auxiliary variate which is highly correlated with y; in some sense of 
this term. The choice among total acreage, total tillable acreage, and 
“total value is made by examining which of the three has the highest 
correlation, on the average, with the items that are included in the 
survey. We shall not discuss at present how this average correlation 
would be calculated, since our interest is in the general concept of a 
good measure of size rather than in a specific definition. 

For the same survey, the best measure of size may depend on the 
item. If the item has been enumerated in a recent census, it is often 
found that the best auxiliary variate 2; is the value of this item at the 
previous census. In such cases any general measure of size is inferior 


to a separate measure for each item. However, available previous 
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census data may not include the items that are to be in the new sur- 
vey, but may provide several general measures of size. 

Given a general measure of size, we may utilize it by one of the two 
methods discussed in previous chapters. We may stratify by size. 
Since the variance of y; usually increases with Xi, the sampling fractior 
should ordinarily be changed from stratum to stratum. Complete 
enumeration of the strata with the largest units may be advisable. 
The second method is to use a ratio or a Yegression estimate with x; as 
the auxiliary variate. This allows the stratification to be employed to 


control some other factor. A combination of the two techniques is 
sometimes fruitful. 


9.10 Sampling with probability proportional to size. A third tech- 
nique, suggested by Hansen and Hurwitz (1943), is to assign a higher 
probability to the large units when the sample is being drawn. This 
technique has found its principal use in surveys which employ sub- 
sampling (chapter 11), but it is also applicable to the present problem. 
Sampling with probability proportional to Size is illustrated in the fol- 
lowing example of a small population of 7 units: 


Measure Sum of 


Assigned 

Unit of size measures Tange 

1 3 3 1-3 

2 1 4 4 

3 11 15 5-15 

4 6 21 16-21 

5 4 25 22-25 

6 2 27 26-27 

7 3 30 28-30 


The cumulative sum of the measures of 
unit, we draw a random number between 1 and 30: Suppose that this 
is 19. In the sum, number 19 falls in unit 4, which covers numbers 16 
to 21 inclusive. With this method of dr: 


nel i r awing, the probability that 
any unit is selected is proportional to the measure of size assigned to 
the unit. 


size is formed. To select a 


If a second unit is to be selected, thi 
random number between 1 and 30. H ry to our previous 
practice, we do not forbid the selection of unit 4 a second time. Selec- 
tion with replacement is necessary, when n exceeds 1, in order to keep 
the probabilities of selection proportional to the sizes. This may be 
seen by the extreme case n = 7. If selection were made without re- 
placement, all units would automatically be chosen, even though we 


© process is repeated with a new 
Owever, contra; 
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had gone through the procedure of selection with probability propor- 
tional to size. For values of n between 1 and 7, selection without re- 
placement leads to probabilities that are intermediate between equal 
probabilities’ and probabilities proportional to size. 

As a rule, sampling with replacement is less precise than sampling 
without replacement. However, when n/N is small, the chance that, 
the same unit appears twice in the sample is small, and sampling with 
replacement is practically equivalent to sampling without replace- 
ment.* 


9.11 Theory for selection with arbitrary probabilities. We shall first 
establish a few general formulas under selection with any system of 
probabilities. Let z; be the probability of selection of the ith unit, 
where the z; are any set of positive numbers which add to unity. A 
sample of size n is selected with replacement as described in the previ- 
ous section. 

Let t; be the number of times that the ith unit appears in a specific 
sample of size n, where t; may have any of the values 0, 1, 2, ---, n. 
Consider the joint frequency distribution of the ¢; for all N units in the 
population. 

The method of drawing the sample is equivalent to the standard 
probability problem in which n balls are thrown into N boxes, the 
probability that a ball goes into the ith box being z; at every throw. 
Consequently, the joint distribution of the ¢; is the multinomial 
expression: 

n! 


tilto! +++ tw! 


th 
zizo? +++ ay ™ 


For the multinomial, the following properties of the distribution of 
the ¢; are well known: 


E(t) = nzi V(t) = na(1 — zi): covi(t,t;) = —nz;Zj 


The sample mean under this system of selection is denoted by 9. 
The mean is computed in the ordinary way by adding the item values 
for the n units in the sample and dividing by n. This implies that, if 
a unit has been drawn twice in the sample, its item value receives a 
Weight 2 in the computation of 7.. 

* Sen (1952) has developed variance formulas for a system of selection in which 
the first unit is chosen with probability proportional to size and subsequent units 
are chosen with equal probabilities and without replacement. General methods 
for sampling without replacement and with unequal probabilities have also been 
coh by Thompson (1952), Midzuno (1950), and Horvitz and Thompson 


\ 
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Theorem 9.2 If J- is the mean of a sample drawn with probability 
proportional to z; 


N 
E) =} zy: =F, (say) (9.20) 
i=l 
= TN 
V@.) = EG: — Y)? =— Datu — Fa (9.21) 


Proof: Since any unit in the sample is weighted by the number of 
times ¢; that it has been drawn, we may write 


ea! ies 
Ve =; (hn + loye +++++ tyyy) == 2 i 


i=l 


Note that all the y; in the population appear in this expression. In 


repeated sampling, the ?’s are the random variables, whereas the Yi are 
a set of fixed numbers. Hence 


2 


1x N 
E(jz) = F 2 vEt) =} zy: =F 
Further, ee a 


17x 
V@2) = FT |£ WV) +2 DO yay cov ww | 
i=1 i<j 
ire 
=z [= yiz(1 — z) —2 xy vince 
n Li=ı i<j 
1 2 
Pa IÈ zy? — (E zy)? 


== Daly: — Y.) 
n 


Theorem 9.3 An unbiased estimate of V (g2) from the sample is 


x (ys — Ge)? 
G2) = ae D 


Proof: By the usual algebraic identity, we may write 


n 


“Lu - 9)? = 2 (ui — Y.)? — ng, — Y,)2 


i=l 
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Hence 
E 5 (yi — J) =E >P (i= Yy — nV (g=) 
= 


i=l 
since, by the definition of V(Ẹ-), the mean value of the second term on 
the right is —nV(g-). Introducing the variables ¢;, we have 


ti N 
EDX (yi — g)? = ED ty — Y.)? — nV (g) 
= ima 


We may now regard the y; as fixed quantities and the t; as the random 
variables. Since E(t;) = nzi 
n N 
EDX ui- 9)? =n Days — Y)? — nV (g) 


i=l i=l 
= n? V (g) — nV (g-) = n(n — 1) V (g-) 


by theorem 9.2. This completes the proof. 

Note that, in estimating the variance from the sample, we do not 
weight the y; by the z;, because this weighting has already been intro- 
duced in the selection of the sample. 

These results may now be applied to the estimation of a population 
total when the units are of unequal size. 


Theorem 9.4 A sample of size n is drawn with probability propor- 
tional to measures of size z; = M;/ >> M;. The item totals for the 
units in the sample are y1, Y2, ***, Yn, Where the same unit may appear 
more than once, since sampling is with replacement. As an estimate 
of the population total Y we take Fpps (probability proportional to 
size), where 


1 n 
P = (2+2) (9.22) 


n \zi Z2 z, 


Then pps is an unbiased estimate of Y, with variance 


A 1, 2 i 2 
FE) = EG nE > 2) S x) (9.23) 


Proof: Apply theorem 9.2 to the variate y;/2;. Then 


tb.) Som 5 -5 ga 


i=l i 


7 E- E 
R -7) 


n ir 
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The result is exact for any size of sample. The variance contains no 
fpe, because the sampling is with replacement. The estimate aon is 
not suitable for quick computation, since it involves finding the ratio 
y;/2; on every unit in the sample, and is unlikely to be widely used. 

With this estimate, the optimum measures of size are the set of 
numbers M; for which the z; minimize 


If the y; are all positive, it is easy to see that V(Y) is zero when M; 
œ yi Consequently the best measures of size are the item totals y; 
on the units. This result is not of practical interest, because, if the y; 
were known in advance, the sample would be unnecessary. The result 
suggests, however, that if the items are relatively stable through time, 
the most recently available previous values of the y; may be the best 
measures of size to adopt. 


9.12 Comparison with the ratio estimate. A comparison between 
pps and Yr, the ratio estimate derived from sampling with equal 
probabilities, is of some practical interest. Since V(Ŷpr) is known only 
for large samples, the comparison must be restricted to this case. For 
the ratio estimate, 


By theorem 6.1, the approximate variance of a ratio estimate is 
N(N —n) & 
V(Pr) = ——— È (m — Ra)? 
nV — 1) iat 


Since x; = M; and R = Y/(È M,), this may be written 
N(N — 7) & MY’ 
neo = MH E (p MY) 


n(N — 1) ir ÈM: 
NN —n) & 
= —__—_ p= Ve)" 
aTa (9.24) 


For the unbiased estimate with pps sampling, we have from (9.23) 


12 i A 4 
TO) = = a = y) =-=} (w — Yz)? (9.25) 


cat i N i=] Ži 
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Assuming n/N small, the two comparable quantities are 


nV(Pr) =N > (yi — Yz)? (9.26) 
nV (Pops) = 2+ U: — Yz)? (9.27) 


For some populations the ratio estimate is superior, and for others 
the pps estimate. I do not know any simple rule for predicting the 
better estimate: formulas (9.26) and (9.27) can be used to make the 
comparison when population data are available. 

One result will be presented. Suppose that 


Yi = Yate 


where e; is independent of z; in the probability sense. In arrays in 
which z; is fixed, we assume that 
Ele) =0; Ele?) =az (> 0) 
If g = 1, this model satisfies the conditions in which the ratio estimate 
is a best linear unbiased estimate (section 6.8). 
From (9.26), i 
nV(Yr) =N > e? = NE(e?) 


= 
= aN7E(z;*) 


Where the average is now taken over all values of z;. 
Similarly, from (9.27), 


Ne ji ei 
nV (Ppp) = È = (y; — Yz)? = NE (<) 
= aNE(z°—) 


Hence 


1 
V(Pr) > V(Pprs) if EG:s) — yee) >0 


Since E(;) = 1/N, because the z; add to unity, the inequality may be 
Written 
E(@aé—) — E@)EG@E") > 0 


This expression is the covariance of the variates Zi and ae. y The 
Covariance vanishes if g = 1. If g > 1, the covariance is positive, 


Since the variate z; lies between 0 and 1. If0<g <1, the covariance 
18 negative, 
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To summarize, we have assumed that the relation between y; and 
zi is a straight line through the origin. If thé variance of y; about this 
line increases faster than z;, pps sampling is more precise. In the few 
studies that have reported data exhibiting the relation between V (e;) 
and z; the variance has increased at a rate somewhere between az; 
and az;”. 

This comparison was made for equal sample sizes in the two methods. 
If it costs more to obtain data from a large unit than from a small one, 
the comparison is biased in favor of pps sampling, which tends to con- 
centrate on the larger units. Further, the ratio estimate is simpler 
to compute. 


9.13 Extension to stratified sampling. Sampling with probability pro- 
portional to size is likely to be useful when a stratification has been 
made by some variable other than size. If the samples within each 
stratum are small and the total sample is not very large, we have 
seen (section 6.11) that the available variance formulas for ratio esti- 
mates are somewhat suspect and that one of the ratio estimates may 
be seriously biased. 

With pps sampling, the estimated total is the sum of the estimates 
from the separate strata: 


Pope =D — E) 


h Nh i=1 \Zhi 


From the previous theorems we obtain 


1 & Yni a 
Voa) ae a ar 2; a — Yn) (9.28) 
h Mh i=l Zhi 


1 eh Yni (TaN? 
v(Lnps) => 29 | : ( =} (9.29) 
a (mp — Dii lene Zhi 

9.14 Exercises. K 

9.1 For the data in table 9.1, compare the relative net precisions of the 
four types of unit when the object is to estimate the total number of seedlings 
in the bed with a standard error of 200 seedlings. (Note that the fpc is in- 
volved.) A : 

9.2 For the data in table 6.2 (p. 126) estimate the relative precision of the 
household to the individual for estimating the sex ratio and the proportion of 
people who had seen a doctor in the past 12 months, assuming simple random 


sampling. 
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9.3 A population consisting of 2500 elements is divided into 10 strata, 
each containing 50 large units composed of 5 elements. The analysis of vari- 
ance of the population for an item is as follows, on an element basis: 


df ms 
Between strata 9 30.6 
Between large units within strata 490 3.0 
Between elements within large units 2000 1.6 


Ignoring the fpc, is the relative precision of the large to the small unit greater 
with simple random sampling than with stratified random sampling (propor- 
tional allocation)? 

9.4 A population containing LVM elements is divided into L strata, each 
having N large units, each of which contains M small units. The following 
aloe come from the analysis of variance of the population, on an element, 

asis: 


Sı? = Mean square between strata 
S: = Mean square between large units within strata 
S? = Mean square between elements within strata 


If Ñ is large and the fpc is ignored, show that the relative precision of the 
arge to the small unit (element) is improved by stratification if 
(M — 1) 2 M 1 


“ar Sa 

9.5 The large units in a population arrange themselves into a finite num- 
ber of size classes: all units in class h contain M+ small units. (i) Under what 
Conditions does sampling with pps give, on the average, the same distribution 
of the size classes in the sample as stratification by size of unit, with optimum 
allocation for fixed sample size? (ii) If the variance among large units in 
class h is kMy, where k is a constant for all classes, what system of probabili- 
ties of selection of the units gives a sample in which the sizes have approxi- 
mately the same distribution as a stratified random sample with optimum 


allocation for fixed sample size? 
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CHAPTER 10 
SUBSAMPLING WITH UNITS OF EQUAL SIZE 


10.1 Introduction. Suppose that each unit in the population can be 
divided into a number of smaller units, or elements. A sample of n 
units has been selected. If elements within a selected unit give similar 
results, it seems uneconomical to measure them all. A common prac- 
tice is to select and measure a sample of the elements in any chosen 
unit. This technique is called subsampling, since the unit is not 
measured completely, but is itself sampled. Another name, due to 
Mahalanobis, is two-stage sampling, because the sample is taken in two 
steps. The first is to select a sample of units, often called the primary 
units, and the second is to select a sample of elements from each chosen 
primary unit. 

Subsampling has a great variety of applications, which go far beyond 
the immediate scope of sample surveys. Whenever any process in- 
volves chemical, physical, or biological tests that can be performed on 
a small amount of material, this is likely to be drawn as a subsample 
from a larger amount which is itself a sample. 

In this chapter we consider the simplest case, in which every unit 
Contains the same number M of elements, of which m are chosen when 
any unit is subsampled. A schematic representation of a two-stage 
sample, with M = 9 and m = 2, is shown in figure 10.1. 

The principal advantage of two-stage sampling is that it is more 
flexible than one-stage sampling. It reduces to one-stage sampling 
when m = M, but unless this is the best choice for m, we have the 
opportunity of taking some smaller value that appears more efficient. 
As usual, the issue reduces to a balance between statistical precision 
and cost. When elements in the same unit agree very closely, con- 
siderations of precision suggest a small value of m. On the other hand, 
it is sometimes almost as cheap to measure the whole of a unit as to 
subsample it, e.g. when the unit is a household and a single respond- 
ent can give accurate data about all members of the household. 

Notation. With multistage sampling, the notation is apt to become 
troublesome, because it is necessary to distinguish among several kinds 
of mean—the mean per primary unit, mean per subunit, and so on. 

215 


216 ‘SUBSAMPLING WITH UNITS OF EQUAL SIZE 10.1 


LX] denotes an element in the sample 


Ficure 10.1 Schematic representation of two-stage sampling (N = 81, n = 5, 
M = 9, m= 2). 


The basic scheme is as follows for two-stage sampling. The symbol 
yi; denotes the observation obtained for the jth element (subunit) in 
the ith unit. As before, y and Y denote the sample and population 


totals, respectively. 
m 


Sample mean per element in the ith unit = 9; = > y;;/m 
j=1 


= 9 =D gi/n = y/nm 


t=1 


Overall sample mean per element 


Sample mean per primary unit = f =Lyi/n = y/n = m 
i=l 


Analogous definitions hold for the population means F;, Y, and Y 
The essence of the notation is that a single bar denotes an average 
over any single stage, a double bar an average over two stages. The 
subscripts (if any) indicate what is being held constant. Thus the 
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average of the y,; for fixed 7 is g;: the average of the g; over the units 
is 7. 

Note also that N, n are used for the number of primary units, and 
M, m for the number of subunits per primary unit. Since some authors 
reverse these roles, a careful check of the notation is advised when 
reading references cited here. 


10.2 Elementary theory. The theory of subsampling was developed 
in connection with the sampling of the plots in agricultural field experi- 
ments. In these applications both the sampling fraction n/N and the 
subsampling fraction m/M are usually small and fpe’s can be ignored. 
Since the resulting theory is elegant and is adequate for many applica- 
tions, we shall describe it first. Actually, the elementary theory re- 
quires only that n/N be negligible, say less than 0.05, as will be seen 
when the exact theory is presented. 

The observation y;; in the jth element of the dth unit is assumed to 
be of 

the form mE Teratai (10.1) 
where the term u; represents a component associated with the unit 
and constant for.all elements in the unit. The term w;; represents a 
component of variation from element to element within the unit. The 
variates u; and w,; are all independent in the probability sense and 
have zero means. The variates u; have variance 8,2 (u for unit), and 
the w;; have variance S,.” (w for within). 

The values of N and M are assumed infinite. The units are chosen 
at random from the population, and the elements at random from the 
units. 

It is easy to show, as a consequence of the model, that the sample 
Mean per element J is an unbiased estimate OL 2s 


Theorem 10.1 With this model, the variance of the sample mean 


Per element 7 is S2 Swe 
u 


v@) = A + os (10.2) 


Proof: From (10.1) we have 
E uo +*+ Un wir + Wig +e Wam 
E E eS ae ii 
n 


Hence, by the formula for the variance of a mean from an infinite 


Population, 2 2 
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Note that an increase in m diminishes only the contribution from 
the variance Sw” within units: an increase in n diminishes both com- 
ponents of the variance. 


Theorem 10.2 An unbiased estimate of V(J) is obtained from the 
sample as P P 
Lu- Ean- 
i=l t=1 
mn(n—1) nn- 1) 


vg) = 


The first form on the right is the most convenient for computing. 
Proof: By an algebraic identity we have 


LG - I =E a- Y -ang - VP 
i=l i=l 


From equation (10.1), averaging over the m elements in the dth unit in 
the sample, 


tio: (wi + Wie +++ ++ Wim) 
m 
Hence, 
= Sw? 
Bg; — YP = 8? +— 
m 
n nS,2 
ED G@ — Y = ns +— 
i=l m 


By theorem 10.1, 
2 
nBQ — F)? = nY) = 8,2 +5" 
m 
Hence, by subtraction 


n 2 
BY G- 9? = a- D (82+) = aa- vo 
a 
This completes the proof. 

Note that the estimated variance is computed solely from variation 
between units, just as in a design with no subsampling. The variation 
within units, although it does not appear explicitly in the estimated 
variance, is, however, taken into account, as is evident from the term 
Sw?/nm in VJ) in theorem 10.1. 


10.3 Prediction of the variance for other subsampling ratios. From 
the mathematical model we can predict the variance of J for sampling 
and subsampling ratios that are different from those uced in a survey 
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for which we have data. This information is helpful in the planning of 
future samples on the same type of population. 

Suppose that the initial sample has n units and m elements per unit. 
If these numbers are changed to n’ and m’, respectively, then by 
theorem 10.1 the variance of the sample mean becomes 

Pn ss 
ACD id = oy (10.3) 

n n'm 
In order to utilize this formula, sample estimates of S4” and Su? are 
required. These are obtained from the analysis of variance of the 
sample data shown in table 10.1. The right-hand column indicates 


TABLE 10.1 ANALYSIS OF VARIANCE OF THE SAMPLE (ON AN ELEMENT BASIS) 


df ms Estimate of 
m Di — 9)? 
Between units Git), ie n—-1 Sel E e 
LD wa — 9)* 
Within units between elements n(m — 1) sw = — AGH =i) Su 


the quantities of which the mean squares sp? and sy” are unbiased 
estimates. The result for sẹ? follows at once from theorem 10.2, since 
s? = nmv(9). The result for sw” is easily verified. Consequently, an 


F A m 
unbiased estimate of Su” is i Š 
S — Sw 


m 


Hence an unbiased estimate of V(9’) is 
Sarea Sine 1 fs? 2 (= E 
0(9’) = T + ah = — T + Sw at = (10.4) 


Example. King and Jebe (1940) report the following analysis of 
Variance in sampling wheat fields in N orth Dakota, 1938. Two small 
samples were taken from each field, and the fields were stratified by 


districts, 


TABLE 10.2 ANALYSIS OF VARIANCE OF WHEAT YIELDS (BUSHEL PER ACRE) * 
ms 
Between fields within districts s = 180 


Within fields between subunits ; " 
(elements) Sw = 


* Since the analysis presented by King and Jebe refers ne field mean, 
Squares have been multiplied by 2 to place it on a subunit basis. 


the mean 
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Since the fields were not chosen at random, but by following routes 
designed so as to give good coverage of the area, the mean square be- 
tween fields may be an overestimate of the variance that would be 
obtained from a random sample of fields. This disturbance and the 
effects of variation in field size will be ignored. 

We will examine how the variance of the sample mean is affected by: 


i. Doubling the number of fields, with two subsamples per field. 
ii. Keeping the number of fields unchanged, but taking four sub- 
samples per field. 
iii. Keeping the number of fields unchanged, but completely harvest- 
ing the fields. 


Let n denote the number of fields in the original sample. From 
formula (10.4), the following estimated variances for the sample mean 
are obtained (note that m = 2): 


Original sample: (n’ = n, m' = 2) v = C) (=) = po 
n. 2 n 
Case i: (n’ = 2n, m = 2) vu = 5) (+) 2 45 
2n. 2 n 
Case ii: (n’ = n, w = 4) vii - (2) (F-%)- 8 
n 2 4 n 
Case iii: (n’ = n, m’ = «) ma (-) (= k 5 dyii 
n, 2 2 n 


In case iii complete harvesting is assumed to be equivalent to taking 
all possible subunits from every field in the sample. Since the size of 
the subunit was very small compared to the size of a field, this implies 
that m = œ, 

Cases ii and iii show that increases in the subsampling ratio, keeping 
the number of fields constant, produce only modest reductions in the 
variance. If a marked increase in precision is wanted, the number of 
fields must be increased. 


10.4 General theory. We now drop the assumptions that n/N and 
m/M are small and that the mathematical model holds. It is still con- 
venient to express the variances in terms of the quantities S,2 and 
Sw’, but these must first be defined in terms of the observations Yi: 
The definitions are constructed from the analysis of variance for the 
complete population, shown in table 10.3. Thus, Są? and S,,? are de- 
fined so that the two equations stated in the two lines of the analysis 
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of variance are valid. These definitions were chosen because they 
enable the general theory to be expressed as a natural extension of 
the elementary theory. 


TABLE 10.3 ANALYSIS OF VARIANCE FOR THE COMPLETE POPULATION 
(oN AN ELEMENT BASIS) 


Defined as 
df ms equal.to 
N 
MD (¥;- YP 
Between units (N — 1) i ee ee 
N-1 
NM 
2 2 (vs — Y)? 
ee N = E 
ithin units between elements N(M — 1) NM — 1) 


With some populations the quantity S4? may turn out to be nega- 
tive; this happens when elements in the same unit are negatively cor- 
related. Any feeling of discomfort created by the appearance of a 
negative variance can be avoided by expressing the results in terms of 
Sw” and p (the intra-unit correlation coefficients) instead of Sw? and 
Ba. However, all formulas remain correct when S,” is negative. 

In two-stage sampling, expected values must be found not only over 
all possible samples of n units that can be drawn, but also over all pos- 
sible subsamples that can be drawn from the selected set of units. It 
is often helpful to perform the averaging in successive stages. For this 
Purpose we introduce two symbols: 


E = Average over all subsamples from the ith unit. 
E = Average over all subsamples from a fixed set of n units. 


If the m elements in any chosen unit are selected by simple random 
Sampling, 
By: = Ye 


Hence 
EJ = Fn 

n 

f the n units in 


Where F,„ denotes the mean that would be obtained i r 
the n units are 


the sample were enumerated completely. If, further, 
also chosen by simple random sampling, 
EY,=Y 


This shows that 7 is an unbiased estimate of Y. 
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Theorem 10.8 If the n units and m elements from each chosen unit 
are selected by simple random sampling, 
> N — 7) S? (MN — mn) S,.? 
V@ =EQG@-Y)?= + 
@) g ) ace ay 


Proof: Write 


(10.5) 


g-Y=@-Y,)+(¥,—-Y) (10.5’) 


When we square both sides and take the average over any fixed set of 
n units, there is no contribution from the cross-product term on the 
right, since 


EQ) =F, 


Consider the first term on the right. Each of the n units may be 
regarded as a stratum composed of M elements. The sample from 
these units is a proportionally stratified sample, since m elements are 
taken from every stratum. Consequently, the formula in theorem 5.3, 
p. 69, for the variance of the mean of a stratified random sample may 
be applied. This gives 
1 2M(M -m 


yD E 


EG —Y,)? = 
n 


(nM)? m 
where Sj; is the variance within the ith unit. This may be rewritten 
= (M-m) 1 
TEC aye eee ak ig 
EG — Yn) i mmo (10.6) 


where Syn” is the average variance within these n units. If we further 
average over all possible sets of n, it is clear that the average of Swn 
is Su”, as defined from table 10.3. Hence 


(M =m) 1 
EQ — Y,)? = —_— .— 5,2 7 
G ) M nS (10.7) 
The contribution from the second term on the right of equation (10.5’) 
presents no difficulty, since Y, is the mean of the values F; for a simple 
random sample of n units. Consequently, by theorem 2.2, p. 15, 


N 
L (F: - Y? 


(N — 7) ia (N = n) Su” 
ENS, -a Ee fan S ae 
EF, — Y) Nn (N — 1) Nn (se t 4) 


10.8) 
by the definition of S4? in table 10.3. ( 
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From (10.7) and (10.8) we obtain finally, 


(M — m) 8S (N-n) ( =) 
8.7 
M mn a Nn EF M 


V@ = 2G - F}? = 


_ N-n 8.7 4 (MN — mn) Sw? 


N n MN mn 
If n/N’ is negligible, V(7) reduces to 

Se oS” 

n mn 


in agreement with the elementary theory. 


10.5 Estimation of the variance in the general case. The first step 
is to find sample estimates of Su? and Sw”. If the analysis of variance 
of the sample is performed as in table 10.1, it turns out that the mean 
squares s}? and sy? still have the expectations given in the table: i.e. 


Els?) = Su? + mS? (10.9) 
Elea) = Su" (10.10) 


The result for sẹ? is easily verified, but that for sp” is less obvious. It 
may be proved by straightforward methods as follows. 


n 
m > Gi -— I)? 
| el 
ae n-1 
Now 


X g- =E g? -n (10.11) 
Write ial ‘=I 
j: = Y: + @— Ya) 
Since g; is the mean of a random subsample of size m, 
(AL — m) Suis 
M m 
ples from this unit. Hence, 


Bg?) =Y? + 


where the average is taken over all subsam 
for a fixed selection of n units, 


n iml i=. 
LSY a“ Sion? (10.12) 
m 
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Further, write = a 
J= Yn + @G— Yn) 
so that 
EP = ¥,? + EQ — ¥,) 


where Y, is the true mean of all n units in the sample. In equation 


(10.6) it was shown that 
m (M —m) 1 
Eg -= Y = ALa 
= G n) M me” 
Hence, | 
= 5 (M — m) 
Enp? = nY,? Sm Se (10.13) 
From (10.12) and (10.13), 
BE- =D (7: Pp STV mg 
™ i=l i=l Mm 
ie. 
2 (Ps — YP 
` z = M — 
È @:-7 =m z qd Ban? 


Now average over all possible selections of the n units. By theorem 
2.4, the first term on the right is an unbiased estimate of m times the 


population variance of Y;. This gives 
N (F. fr 
(Y: — Y) (M — m) 
E(s,?2) = = + ——_—_*g,? 
= yn te 
m m 
(Sa? + M84) + (1-7) 5.2 = 9.7 + mS? 


yi 


using the equation in table 10.3 which defines S,2. This completes 


the proof. 
These results lead to an unbiased estimate of V (Ẹ). 


Theorem 10.4 An unbiased estimate of V(J) from the sample is 
1 (N-n) 2, M-mn al 
J) = — ‘a 14 
09) E { vy d m N” (10.14) 
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Proof: Substitute the expected values of sp”, Sw” in v(9). 


1 = z 
Ev) = = W=” ig 2 PEEN (M — m) n 


E T) 
N M N 


_ (N-n Su? F (MN — mn) Su? 
N n MN mn 


by collecting the terms in Sw”. 

If m = M, formula (10.14) becomes that appropriate to simple 
random sampling of the units. 

If n = N, the formula is equivalent to that for proportional strati- 
fied random sampling, since units may then be regarded as strata, all 
of which are being sampled. (Incidentally, two-stage sampling is a 
kind of incomplete stratification, with the units as strata.) 

In the common situation in which n/N is negligible, 


T 
yy oo eS. 
r0) = mn n(n — 1) (1015) 


This agrees with the result from the elementary theory, theorem 10.2. 

When m = 1, the sample provides no estimate Sw. This does not 
matter provided that n/N is negligible, since in that event Sw? does 
not appear in v(y). One application of this result occurs when the 
subsampling is systematic. Since a systematic subsample is equivalent 
to a simple random subsample with a more complex type of element 
and m = 1, formula (10.15) remains valid with systematic subsam- 
pling unless n/N is substantial. If the first-stage sampling is system- 
atic, however, the formula holds only if the systematic sample of the 
units is equivalent to a simple random sample. 


10.6 Optimum sampling and subsampling fractions. These depend 
on the type of cost function. One form that has proved useful is 


C = eun + cenm (10.16) 


is proportional to the number of units 


The first component of cost, Cun, 
al number of elements. 


in the sample; the second, cenm, to the tot 
We choose n and m so as to minimize 


VQ) + MC — en — cenm) (10.17) 
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Since m enters into V and C only in the combination nm, put k = nm, 
and write (10.17) as 


E Jst G =) Se? + MC k) 
=m] Su -= = — |} S? =en- c, 
n N k MN/ ” a Naa 
Differentiation with respect to n and k gives, respectively, 
8.2 
oo 
Sun 
= = Ne 
k? 
Hence 
k S, 
om tite = (10.18) 
n Sa Nice 


The structure of this formula is as would be expected. The greater the 
variability within units relative to that among units, the higher the 
optimum m. Similarly, the greater the cost cu of access to the unit 
relative to the cost ce of obtaining data*from any element in the unit, 
the higher the optimum m. 

The value of n is found by solving either the cost equation or the 
variance equation, depending on whether cost or variance has been 
fixed. 

For practical use, estimates of Sw and Su are usually obtained from 
the analysis of variance (table 10.1) of a samplé with specific values of 
nand m. The sample estimate of mop: is 


A Vm few (10.19 
Moot = Tah = ot Ni i 


The value found will be non-integral. It is sufficient to take the 
nearest integer, although Cameron (1951) has pointed out that this is 
not quite the best rule. If mop, lies benyen the two integers m, 
(m + 1), we should choose (m + 1) if Mop > m(m + 1): otherwise 
we choose m (see exercise 10.3). Thus, if fop: is between 1.414 = V2 
and 2, we round upwards to 2. If miop: is greater than M , Or if sẹ? is 
less than sw”, we take m = M and employ one-stage sampling. 

The estimate Mop: is itself subject to a sampling error, whose size 
depends on several factors, but mainly on the number n of units in the 
sample from which sp” is computed. Confidence limits for Tope tend 
to be wide when n is less than 10. However, the optimum is broad, 
and an error of a few units in m may produce only a small loss of pre- 
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cision. The following example presents a method for investigating 


this issue. 
Example. Let 
Ce=10e Sx 13Sa 
then 
Mop = 1.3 V10 = 4.1 


We will regard total cost as fixed, and see how the variance of 7 changes 
with m. Both N and M are assumed large. 
Si Be 
vo == 


n nm 


à Sw?\ (eu + me 
a (s. 7 m )( Cc 


eliminating n by means of the cost equation. In our example, V may 
be written 


Vm = Suce ¢ ce R ) (= a ) Buts (1 T! =e) 10 +m) 
w=- ms2/\q TT ma ) tee 


Omitting the constant factor, the relative variance can be calculated 
for different values of m. Table 10.4 shows these variances and the 
relative precisions (with the maximum precision for m = 4 taken as 
the standard). 


TABLE 10.4 RELATIVE VARIANCES AND PRECISIONS FOR DIFFERENT VALUES OF M 


m= 1 2 3 4 5 6 T 8 9 10 


Rel. variance | 29.59 | 22.14 | 20.32 | 19.92 | 20.07 | 20.51 | 21.10 | 21.80 | 22.56 | 23.38 
Rel. precision | 0.67| 0.90| 0.98| 1.00| 0.99| 0.97] 0.94| 0.91| 0.88| 0.85 


For values of m between 2 and 9, inclusive, the loss of precision 
relative to the optimum is less than 12 per cent. An interesting appli- 
cation of this type of analysis to the British monthly surveys of sick- 
ness is given by Gray and Corlett (1950). 

We now consider how well Mop: can be estimated from an initial 
sample with m = 4 and various values of n. From (10.19) 


e Vm cu 6.324 
ot SPS) — 1 Nee V(s0?/sw*) — 1 
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If the original variates y;; are normally distributed, it is known from 
analysis of variance theory that s,7/s,,2 is distributed as 


Sie 
F ( +m =a) = 3.367F 


where F follows the F-distribution with (n — 1) and n(m — 1) degrees 
of freedom (Eisenhart, 1947). This gives 


aaa 6.324 
ot -/3.367F — 1 
This result provides confidence limits for Mop. Take n = 10, and 


consider 80 per cent limits. The degrees of freedom are 9 and 30. 


From the 10 per cent significance levels of F (Merrington and Thomp- 
son, 1943), we find 


F 10(9, 30) = 1.8490 
F s0(9, 30) = 1/F 10(30, 9) = 1/2.2547 = 0.4435 
Substitution of these values of F gives 
Lower limit: mop: = 2.8 


Upper limit: mop, = 9.0 


As we have seen from table 10.4, any m in this range gives a degree of 
precision that is fairly close to the optimum. Thus, with n = 10, the 
chances are 8 in 10 that the loss in precision is small. 

The 80 per cent and 95 per cent confidence limits for n = 5, 10, 20 
appear in table 10.5. The upper limits m = œ which occur in three 
cases imply single-stage sampling. 


TABLE 10.5 CONFIDENCE LMITS FOR Mops 


n 80 per cent. 95 per cent 
5 2.5, o 1.8, «0 
10 2.8, 9.0 2.3, œ% 
20 3.1, 6.4 2.7, 9.1 


To summarize, with n = 20 we are al 
well enough so that the precision actually 
This is not true with n = 5. Results m 
other values of the costs and variances. 


most certain to estimate Mopt 
attained is near the optimum. 
ay of course be different with 


10.7 Subsampling for Proportions. 
two classes and we wish to estimat 
first class, the preceding formulas c 


If the elements are classified into 
e the proportion that falls in the 
an be applied by the usual device 
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of defining y;; as 1 if the corresponding element falls in this class, and 
as zero otherwise. Let p; = a;/m be the proportion falling in the first 
class in the subsample from the ith unit. The two estimated vari- 
ances, sẹ? and Sw, work out as follows: 


n 
m 2 (pi — B)? 
s = i=l 
n= I 
m z 
su? = —— È piti 


n(m — 1) i= 


where Ð = >. p;/n. Consequently, the formula for the estimated 
variance in two-stage sampling is (by theorem 10.4) 


pr (N — n) 1 n a 
o(p) A T x (pi — P) M ; 
A —m, s 
“= Ain (ies Di È pigi 
n(m — 1) iz 
10.8 Three-stage sampling. The process of subsampling is sometimes 
carried to a third stage by sampling the subunits (elements) instead of 
enumerating them completely. For instance, in surveys to estimate 
crop production in India (Sukhatmė, 1947), the village is a convenient 
sampling unit. Within a village, only some of the fields growing the 
crop in question are selected, so that the field is a subunit. When a 
field is selected, only certain parts of it are cut for the determination 
of yield per acre: thus the subunit itself is sampled. If physical or 
chemical analyses of the crop are involved, an additional subsampling 
may be used, since these determinations àre often made on a part of 
the sample cut from a field. 

Results for the elementary theory will be given briefly. The popu- 
lation contains N units, each with M subunits, each of which has K 
subsubunits. The corresponding numbers for the sample are n, m, 
and k, respectively. The model is 


Vija = Y + u; + wig + eijz 
where the components Ui, Wij, @ijz are all independently distributed 


with means zero and variances Su’, Sw, and Sww, respectively. It 
follows that the variance of the sample mean per subsubunit is 


ae Sai ‘Sia 
(10.20) 
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The sample analysis of variance, on a subsubunit basis, is shown in 
table 10.6. 


TABLE 10.6 ANALYSIS OF VARIANCE FOR THREE-STAGE SAMPLING 


df ms Estimates of 
mk i: = 0)? 
Between units (n — 1) sy = = PET Sow + KSu? + MkSu? 
Between sub- A 
units within kD Ws — 9? 
units n(m — 1) sv = —— Sow + kS,? 
n(m — 1) 
Between sub- 
subunits e 
within AY (itz — Fis)” 
i = m O Bo 
subunits nm(k — 1) s, EEN 


The expectations in the right-hand column are easily verified from 
the model. From equation (10.20), an unbiased estimate of V(9) is 
8y°/nmk. As in two-stage sampling, an unbiased estimate can also be 
obtained of V() for values of n, m, and k different from those used. 

In the general theory the first step is to define the variance com- 
ponents Su’, Sw’, and Sy.” by an analysis of variance for the complete 
population (table 10.7), analogous to table 10.3 for two-stage sampling. 


TABLE 10.7  Anaysts or VARIANCE FOR THE POPULATION 


df ms Defined as equal to 
MK © (F; — Py? 
Between units (N — 1) N T Sww* + KSu? + MKS,2 
Between subunits K D (Yi — F) 
wane . =i ii 2 

within units N(M — 1) i =) D Sw" + KSu? 
Between subsub- 

units within Da Yijz — Yj)? 

subunits NAL) =E Sra? 


NM(K — 1) 


Theorem 10.5 If simple random sampling is used at all three stages, 
1 1 1 I 1 T 
ro- Gase Ee G e 
Deg E NA ENa 7 Nang) Seo 
Proof: Since the proof is a natural extension of that for tw 
sampling, only the principal steps will be indicated. Write 


T- r= g- Fa + (Pam — n) + (P, — P) 


o-stage 
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where Pn» is the population mean for the nm subunits that were 
selected, and F, the population mean for the n units that were selected. 
When we square and take the average, the cross-product terms vanish. 
The contributions of the squared terms turn out to be as follows: 


M—m\ 1/1 
E(Pam — = ( =G ua 2} 
( ) M nm s is 


N-—n\1/1 1 
BP, — Py = ( )=( Sea? i a) 
( ) N /n\MK + PR 


When these three terms are added, the theorem is obtained. 
Theorem 10.6 An unbiased estimate of V (Ẹ) from the sample is 
(N — n) M-mn , (K-k)n m r] 
s z ww 


1 2 

a =k . M N” K NM 

The proof reduces to showing that the results stated in table 10.6 
for the expected values of 85”, Sw, and Sw" are valid. The details are 
straightforward although tedious. 

The extension of these results to further stages of sampling should 
be clear from the structure of the formulas. Optimum sampling frac- 
tions at the second and third stages can be investigated as for two- 


stage sampling. 


10.9 Stratified sampling of the units. Subsampling may be combined 
with any type of sampling of the units. The subsampling itself may 
employ stratification or systematic sampling. Variance formulas for 
these modifications can be built up from the formulas for the simpler 
methods. Results will be given for the combination of subsampling 
with stratified sampling of the units. We assume that unit sizes are 
constant for a given stratum, but may vary from stratum to stratum. 

The subscript h refers to the stratum. The population variances 
Sun? and Sun? are in general defined separately for each stratum by an 
analysis of variance similar to that in table 10.3. The Ath stratum 
contains N, units, each with Mh elements: the eorresponding sample 
numbers are 7, and ma. The estimated population mean per ele- 


ment is ature 
DMN . Dee iss 
i 


‘es eke, a j=l 
t= » where J, = ———— 
Y > MiNi Mary, 
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By applying theorem 10.3 within each stratum, we find 


(Nn — N) Sun? (M,N — mam) Sat} 

2 

y x MaNi) | Ni, Nh M,N, MeN), 
(Get) = MN 


From theorem 10.4, a sample estimate that is unbiased is 
(MiNn)? [e — na) Pag (Mh — ma) n 
Ni Mnr Na 

(È MN,)? 


If the sampling fractions n}/N» for the units are all small, these 
formulas simplify to 


‘Sur Sae 
Va) =È (MN)? (= Sp \/x M,N,)? 
h 


Nh MNh, 


2 
Swh 
h MAN, 


va) = 


(MN)? 
MAKNA 


va) => son? iy (DMN)? 
h 


10.10 Exercises. 


10.1 From a simple random sample of fields of corn, 2 subunits (each 
consisting of 10 hills) were chosen in each field. The following mean squares 


come from an analysis of variance of the number of ears per hill (on a single- 
hill basis) : 


$ ms 
Between fields 5.89 
Between subunits within fields 1.41 


If it takes 1 hr to locate each field, and 10 min to locate and count 1 sub- 
unit (after the field is reached), what is the optimum number of subunits per 
field? (The fpe’s may be ignored.) 

10.2 In the same survey, the mean Square between hills within the same 
subunit was 0.92. Assuming that this mean Square would not change appreci- 
ably if the subunit contained 20 hills, estimate the change in precision if one 
20-hill unit were taken per field instead of two 10-hill units. 

10.3 Verify the rule (section 10.6), that, if fope lies between the two inte- 
gers m, (m + 1), we should choose (m + 1) if 


Top? > m(m + 1) 


10.4 Show that, if S„? > 0, in the notation of section 10.4, a simple ran- 
dom sample of n primary units, with 1 element chosen per unit, is more pre- 
cise than a simple random sample of n elements m>1, M> 1). Show that 


the precision of the two methods is equal if n/N is negligible. Would you ex- 
pect this intuitively? 
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CHAPTER 11 
SUBSAMPLING WITH UNITS OF UNEQUAL SIZE 


11.1 Introduction. If two-stage sampling is to be used when the pri- 
mary units vary in size, one method îs to stratify by size of unit, so 
that the units within a stratum become equal, or nearly so, in size. 
The formulas in section 10.9 may then be an adequate approximation. 
Sometimes, however, units vary so much in size that substantial dif- 
ferences remain within some of the strata, and sometimes it is advis- 
able to base the stratification on variables other than size. Ina review 
of the British Social Surveys, which are mostly nationwide samples 
with districts as primary units, Gray and Corlett (1950) point out that 
size was at first included as one of the variables for stratification, but 
another factor was found more desirable when the characteristics of 
the population became better known. 

Some concentrated effort is required in order to obtain a good work- 
ing knowledge of multi-stage sampling when the units vary in size, 
because the technique is very flexible. `The units may be chosen either 
with equal probabilities or with probabilities proportional to size or to 
some estimate of size. Various rules can be devised to determine the 
subsampling fractions, and various methods of estimation are avail- 
able. The advantages of the different methods depend on the nature 
of the population, on the field costs, and on the supplementary data 
that are at our disposal. 

The first part of this chapter is devoted to a description of the princi- 
pal methods that are in use. We shall begin with a population that: 
consists of a single stratum. The extension to stratified sampling can 
be made, as in previous chapters, by summing the appropriate variance 
formulas over the strata. For simplicity, we assume at first that only 
a single primary unit is chosen: i.e. that n = 1. This case is not as 
impractical as might appear at first sight because when there is a large 
number of strata we may be able to achieve satisfactory precision in 
estimation even although n, = 1. In the series of monthly surveys 
taken by the U. S. Census Bureau in order to estimate numbers of 
employed people, the county is the primary unit. This is a large unit, 
but it has administrative advantages that decrease costs. Since douns 
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ties are far from uniform in their characteristics, stratification of 
counties is extended in these surveys to the point where only one~ 
county is selected from each stratum. Consequently, the theory to be 
discussed here is applicable to a single stratum in this kind of sampling 
plan. 

As in previous chapters, the quantities to be estimated may be the 
population total Y, the population mean (usually the mean per ele- 
ment Y), or a ratio of two variates. At present the discussion is con- 
fined to the population mean per element. 

Notation. The observation for the jth element within the dth unit is 
denoted by y;;.. The following symbols refer to the ith unit: 


Population Sample 
Number of elements M; mi 
Mean per element Y; Ji 
Total Y; = MY; Yi = mifi 


The following symbols refer to the whole population or sample: 


Population Sample 
N n 
Number of elements M=} M; m= mi 
N n 
Total ; Y=D>Y¥; y=} yi 
Mean per element Y= Y/M J = y/m 
Mean per primary unit Y= Y/N ğ = y/n 


The notation departs from that of chapter 10 in one respect. We 
define M and m as the total number of elements in the population and 
sample, respectively, whereas in chapter 10 these symbols were the 
corresponding totals in any primary unit (all units being of equal size). 
To keep the notation consistent, symbols M and m should have been 


used in chapter 10. 


11.2 Sampling methods when n = 1. Suppose that the ith unit is 
selected and that it contains M; units, of which m; are sampled at 
random. We consider three methods of estimating Y 

I. Units chosen with equal probability. 


Estimate = 91 = Ji- 
ement. Itis biased, For, in 


The estimate is the sample mean per el af 
is Y,, and 


repeated sampling from the same unit, the average of F: 
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since every unit has an equal chance of being: selected, the average of 
Y; is 


But the population mean is 


N 
> MY; F 
y= + __, where M =>) M; 
M i=l 
Hence the bias equals (Y, — F). 
To find V (fr), write 


t- Y =(:- Y) + F- ¥,) + (¥, — P 


Square-and take the expectation over all possible samples. All con- 
_tributions from cross-product terms vanish. The expectations of the 
squared terms follow easily by the methods given in chapter 10. We 
find i 


LX Ums 12 2 m J a 
Vj) = — ——— ee Viera Z4)2 — fy2 
(ir) ve io ta 2 t= ¥.)?+(Ya—Y¥)? (11.1) 
Within units Between units Bias 
where 
BP Sy, E 
Lae -92 (Wij = i) 


is the variance among elements in the zth unit. 

The variance of Ẹr contains three components: one arising from vari- 
ation within units, one from variation between the true means of the 
units, and ‘one from the bias. 

The values of the m; have not been specified. The most common 
choice is‘either to take aif m; equal, or to take mi proportional to M, 
i.e. to subsample a fixed proportion of whatever unit is selected. The 
choice of the m; affects only the first of the three components of the 
variance—the component that arises from variation within units. 

II. Units chosen with equal probability. - 


- Estimate = gn = NM ğ;/M. 


- This estimate is unbiased. Since 9; is an unbiased estimate of Y, 
the product MJ; is an unbiased estimate of the unit total Y,. Hence 
NM iĝ: is an unbiased estimate of the population total Y. D 
M, the total number of.elements in the population, 
biased estimate of F. 


ividing by 
we obtain an un- 


11.2 SAMPLING METHODS WHEN n= 1 237 
To find V(gm), we have 
gm -Y= Rs -¥ 
M 
M 


Now M:Y; = Y; the total for the unit, and Y = NY/M, when Y is 
the population mean per unit. This gives 


- NM; N 
-y = = (ji — Y) += (Y: - Y) 


e= Fot (Sir 3 F) 


Hence, 
2 


vg N S Mat 3 Ysa Ye" (ie 
(gu) = we (M: — mi) a t>! i )* (11.2) 

The “between-unit” component of this variance (second term on the 
right) represents the variation among the unit totals Y;. This com- 
ponent is affected both by variations in the M; from unit to unit and 
by variations in the means F; per element, If the units vary con- 
siderably in size, this component is large even though the means per 
element Y; are practically constant from unit to unit. Frequently this 
component is so large that jm has a much higher variance than the 
biased estimate gr. Thus neither method I nor method II is fully 
satisfactory. 

III. Units chosen with probability proportional to size. 


Estimate = ju = 9: = sample mean. 
This technique is due to Hansen and Hurwitz (1943). It gives a 


sample mean that is unbiased and is not subject to the inflation of the 


variance in method II. 
In repeated sampling, the ith unit appears with relative frequency 


M,/M. Hence, 


NM: 
E@m) =o —Y: =F 
Yr Py; 


Further, 
gm — Y = (m — Y) + F: - F) 


Average first over samples in which the ith unit is selected. 


(M; — ms) 8? O ope 
T +0: -P 


Egu — Y)? = 


i 
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Now average over all possible selections of the unit. Since the ith unit 
is selected with relative frequency M;/M, é 


1 N S N R = 
Vi@m) = = {> M: — m)'— > MAY; - 7} (11.3) 


isl i i= 


Note that, as in method I, the “between-units” component arises 
from differences among the means per element Y; in the successive 
units. If these means per element are all nearly equal, this component 
is small. 

Example. Let us apply these results to a small population, artifi- 


cially constructed. The data are presented in table 11.1. There are 


TABLE 11.1 ARTIFICIAL POPULATION WITH UNITS OF UNEQUAL SIZES 


Unit Ui M: Y; S? Y: Y,;-Y 
cf = “Oat 2 1 0.500 0.5 —2.25 
2 1/2528 4 8 0.667 2.0 —0.75 
3 334455 6 24 0.800 4.0 +1.25 


Totals 12 33 


three units, with 2, 4, and 6 elements, respectively. The reader may 
verify the figures given for Y;, S;?, and Y;. The population mean Y 
is $$, or 2.75. The unweighted mean of the F; is 2.167 = Ya, so that 


the bias in method I is —0.583. Its square, the contribution to the 
variance, is 0.340. 


One unit is to be selected and two elements sampled from it. We 
consider four methods, two of which are variants of method I. 


Method Ia. 


Selection: unit with equal probability, m; = 2. 
Estimate: 9; (biased). 


Method Ib. 


Selection: unit with equal probability, m; = 4M;. 
Estimate: ğ; (biased). 


Method II. 


Selection: unit with equal probability, m; = 2. 
Estimate: NM y;/M (unbiased). 


Method III. 


Selection: unit with probability M;/M, m; = 2. 
Estimate: 9; (unbiased). 
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Method I, (proportional subsampling) does not guarantee a sample 
size of 2 (it may be 1, 2, or 3), but the average sample =:ze is 2. 

By application of the sampling error formulas (11.1), (11.2), and 
(11.3), we obtain the results in table 11.2. 


TABLE 11.2 VARIANCES OF SAMPLE ESTIMATES OF Y 


Method Contribution to variance from Total 
Within units Between units Bias variance 

Ia 0.145 2.056 0.340 2.541 

Ib 0.183 2.056 0.340 2.579 

u 0.256 5.792 0.000 6.048 

II 0.189 1.813 0.000 2.002 


Although the example is artificial, the results are typical of those 
found in comparisons made on many populations. Method III gives 
the smallest variance because it has the smallest contribution from 
variation between units. Method II, although unbiased, is very in- 
ferior. Method Ia (equal size of subsample) is slightly better than 
method Ib (proportional subsampling). 

Some comparisons of these methods have also been made on actual 
populations. For six items (total workers, total agricultural workers, 
total non-agricultural workers, estimated separately for males and 
females) Hansen and Hurwitz (1943) found that method III produced 
large reductions in the contribution from variation between units as 
compared with the unbiased method II, and reductions which aver- 
aged 30 per cent as compared with method I. (They assumed the con- 
tribution from variation within units to be negligible.) In estimating 
typical farm items for the state of North Carolina, Jebe (1952) reported 
reductions in the total variance of the order of 15 per cent as compared 
with methods of type I. In both studies the primary unit was a county. 


11.3 Sampling with probability proportional to estimated size. It 
may happen that the sizes M; of the units are known only roughly. In 
the sampling of towns, where the unit is a block and the element a 
household, the number of households per block is usually obtained 
from city maps or from previous census data, both of which are more 
or less out of date. Some of the advantages of pps sampling can be 
retained by sampling with probability proportional to the best estimate 
of size. Let 2; be the probability assigned to the ith unit, where the z; 
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are any set of positive numbers that add to unity. We still assume 
n=1. 
Method IV. An unbiased estimate of F is 


Ma; 
2M 


This follows because, in repeated sampling, the ith unit appears with 
relative frequency z;, so that 


S (MIN XS Mi: 
E(jrv) = if ) = =y 
Gw) =È aNg PT 
With this method it is customary to select m; so that 


Jv = (11.4) 


mi = — (11.5) 
Zi 


ip me 11.6) 
ial gamer ss (i 


where y; is the sample total. 


The quantity k may be described as the expected overall sampling 
fraction. For 


N N 
E(mi) =) zm; = k X M; = kM 
i=1 i=l 


Hence 
k Expected number of elements in sample 


Numbers of elements in population 
The advantage in choosing m; = kM ;/z; does not become apparent 


in the present simple case. With n > 1, and with stratified sampling, 
it will be seen later that this choice makes the estimate self-we' 


ighting. 

Example. This illustrates how m; is determined. The stratum 
contains 6 blocks, with estimated numbers of households as shown in 
table 11.3. Suppose that we intend to have an overall sampling 


TABLE 11.3 ILLUSTRATION OF THE CALCULATION OF m; 


Estimated no. Cumulative Assigned 
Block households sum range 
1 10 10 1-10 
2 30 40 11-40 
3 17 57 41-57 
4 25 82 58-82 
5 23 105 83-105 
6 


16 121 106-121 
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fraction of 5 per cent. We take k = zy. As with ordinary pps 
sampling, a unit is first selected by drawing a random number between 
land 121. Let this be 96. The block selected is no. 5, which covers 
the range from 83 to 105 in the cumulation. 

The interviewer visits this block and prelists or counts all the house- 
holds on it. We shall suppose that he finds 31 households. Applying 
equation (11.5), he takes 

kM; 1 121 
m; = — = (5) (31) — = 8, to the nearest integer 
` ži 20, 23 


The desired subsampling ratio 
m k 121 
M: z 03) 
is known before the block is listed. Thus the interviewer can be told 


in advance to take 1 household in 4 from this block. This rule is con- 
venient when the subsample is to be systematic, as is often the case in 


practice. 
The variance of gry is obtained in the usual way. Write 


1 
“4 


ey aF [by (11.4)] 
2M 
1 (Mi: L5 M; m 
-ilka nt- a) 


In the variance, each square receives a weight z;. Hence, 


N M(M;—m)S2 X (MY: 3\? 
1 pa mi) S Se eee Pa 
Z, 


V(q = 
v) M? Viat Zi Mi i=l 


If k = m;z;/M; this may be written in the slightly simpler form 
1 [2 (M: —m)s? , x G y} 
vi ee Ea 1.7" 
Gv) = T [2 k Beaty, Gis 


If z; = M;/M, formula (11.7) reduces to formula (11.3) for V (grm). 
If z; = 1/N (initial probabilities equal), formula (11.7) reduces to 
formula (11.2) for the variance of the unbiased estimate when prob- 
abilities are equal. 

Unless z; = M;/M, the “between-units” component in (11.7) is 
affected to some extent by variations in the sizes M; as well as by vari- 


ations in the means per element Fi. 


242, SUBSAMPLING WITH UNITS OF UNEQUAL SIZE 11.3 


Example. Table 11.4 shows the basic computations for finding 
V(jrv) in the artificial population in table 11.1. Since M = 12 and 
the desired sample size is 2, we put k = $. The z; have been taken as 
0.2, 0.4, and 0.4. 


TABLE 11.4 COMPUTATION OF V(r) 


M; Y: | Yi 
Unit Mi} Mi/M | z | m= | 6M: -m| s2 |y: =|2 -yY 
6z; Zi | Zi 
1 |2] 017 | 2 ag 2 0.50 | 1| 5| —28 
2 |ž| 0.38 | .4 af 14 0.667 | 8| 20] -13 
3 |6] 0.50 4 a 21 0.800 | 24 | 60 | +27 


In practice, the m; are rounded to integers. This has not been done 
in the present illustration. From formula (11.7’), the variance comes 
out as follows: 


Contribution from “within-units” = 6 > (M; — m)S2/M? = 0.188 


Y; = 
Contribution from “between-units” =} z; (= - x) /M? = 3.583 
Zi 


Total = 3.771 


Comparison with table 11.2 reveals that method IV has a lower 
variance than the unbiased method II in which the unit is chosen with 
equal probabilities, but method IV is decidedly inferior to method I 
or III. If the sizes were not known, method III (pps) could not be 
used, but the biased estimates obtained in method I from sampling 
with equal probabilities could be used. Apparently, in this example, 
method IV pays too high a price in order to obtain an unbiased estimate. 

The disappointing performance of method IV is not necessarily typi- 
cal. With closer estimates of size the method shows to more advantage. 
It is natural to consider, however, whether the sample mean (as in 
method I) would be better than the estimate adopted in method IV. 
This leads to the fifth method to be discussed. 


Method V. Units chosen with probability proportional to estimated size. 
Estimate = gy = 9; = sample mean. 
The estimate is biased, since 
EG) = D zY; = Mae (say) 


If the z; are good estimates, F, 


E is close to the correct mean Y = 
> M.Y;/M, and the bias is small. 
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If we write 
gv = (@:— Y) + Y: - F) + F- F) 
the three components of the variance work out as follows: 


E E O E E 
vav =o L al: — Y: + Y- Yy 
i=l M: Mi i=l 
Example. If the values of z; and m; are chosen as in table 11.4, the 
reader may verify that the components of the variance of gy are as 


shown in table 11.5. 


TABLE 11.5 CONTRIBUTIONS TO THE VARIANCE IN METHOD V 


Within Between Bias Total 
units units variance 
0.178 1.800 0.062 2.040 


This is superior to all methods except method HI (pps) and is 
almost as good as method III. 


11.4 Summary of methods for n = 1. The five methods of estimat- 
ing the mean per element Y and their variances in the numerical ex- 
ample are summarized in table 11.6. 


TABLE 11.6 TWO-STAGE SAMPLING METHODS (n = 1) 


Probabilities in Estimate Bias Variance 
Method selecting units of Y status in example 
I Equal WA Biased Ia: 2.541 
Ib: 2.579 
NM Hi 
I Equal (Unbiased 6.048 
Mi s 5 m 
Ill M œ size fi Unbiased 2.002 
5 P mifi 3 
IV z; œ estimated size TIM Unbiased 3.771 
V z; œ estimated size i Biased 2.040 


11.5 Sampling methods when n > 1. In this section the principal 
sampling methods and estimates are described for the usual situation 
in which more than one unit is selected. The discussion is still re- 
stricted to a single stratum. 

Consider first the sampling of units with equal probability. As an 
illustration, 20 pages were selected at random from the volume Ameri- 
can men of science. The number of biographies M; per page varies in 
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general from about 14 to 21. On each selected page, 2 biographies 
were chosen at random and the age of the scientist was recorded. The 


data appear in table 11.7. The purpose is to estimate the average age 
of the biographees in the complete volume. 


TABLE 11.7 Aces or 40 SCIENTISTS IN American men of science (n = 20, m = 2) 


Unit Ages Total 
no. M; Yi Yiz Yi Mii 

1 15 47 30 77 577.5 
2 19 38 5l 89 845.5 
3 19 43 45 88 836.0 
4 16 5 4 96 768.0 
5 16 59 45 104 832.0 
6 19 39 38 77 731.5 
7 18 43 43 86 774.0 
8 18 49 51l 100 900.0 
9 18 45 35 80 720.0 
10 18 46 59 105 945.0 
ll 20 71 64 135 1,350.0 
12 18 35 46 81 729.0 
13 19 6l 54 115 1,092.5 
14 19 45 S7 132 1,254.0 
15 18 3 38 69 621.0 
16 16 64 39 103 824.0 
17 16 63 47 110 880.0 
18 19 36 33 69 655.5 
19 19 61 39 100 950.0 
20 19 54 34 88 836.0 
Totals 359 1,904 17,121.5 


Method I, in which the sample mean provides the estimate, has two 
analogues. First, we may take the ordinary sample mean, 


y: 1904 oe 
= ER a E ears 
4 bD Mi 40 x 


SI 


This estimate is biased. This is easily seen when m is constant, since 
in that event 


dL 
E) = N LY: 
t=1 


whereas the population mean per element is Y = > M,Y,/M. The 
biased estimate gives too much weight to the smaller units. If there 
is no correlation between M; and Y,, the distorted weighting does not 
matter greatly, and the bias is unimportant in large samples. But, if 
M; and Y; are correlated, as often happens, the bias may not be neg- 
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ligible in large samples. In the present example we might anticipate 
a small negative correlation between M; and Y;, because the longer 
biographies, which cut down the number per page, tend to be those of 
the older scientists. 

This discussion suggests, as an alternative estimate in method I, the 
weighted mean, 


5 
Ma: 
2 2 Hi 17,1215 =e 
Jr = =a .7 years 
LM: 
i=l 


This is a typical ratio estimate because both the numerator and the 
denominator vary from sample to sample. As is characteristic of ratio 
estimates, the bias is negligible in large samples. If the subsampling 
is proportional, this estimate reduces to the ordinary sample mean and 
coincides with that in method I. 

When the m’s are all equal, the unweighted sample mean sometimes 
has a smaller variance than the weighted mean. In view of its greater 
liability to bias, however, the unweighted mean is more hazardous. 

The unbiased estimate (method II) is given by 

N2 

tn =— Ma: 

nM iz 
In this estimate the quantity X, Mf; which is an unbiased estimate 
of the total of the n units in the sample, is raised by the inverse N/n of 
the sampling fraction, and then divided by the total number M of ele- 
ments. In the present example the number of pages in the book is 
2823, and M (total number of biographies in the book) is given in the 
preface as “about 50,000.” Accepting this figure for illustration, we 


have _ 2823 
ee 
Y = (20) (50,000) 


As in the case n = 1, this estimate often has poor precision if there 
is much variation among the M;. 

Sampling with probability proportional to size, or to estimated size, 
is unlikely to be adopted in this illustration, because of the work in- 
volved in counting or estimating the numbers of entries on all 2823 
pages. The estimates for these methods will be given in algebraic form. 

As pointed out in section 9.10, we must sample with replacement in 
order to keep the probabilities proportional to size or to estimated size. 
If the same unit is drawn twice, the subsample is also taken with re- 
placement. In examining the bias of an estimate we shall use the same 


(17,121.5) = 48.3 years 
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mathematical device as in section 9.11. The random variate í; de- 
notes the number of times that the ¿th unit appears in a specific sam- 
ple. In repeated sampling, E(é;) = nz;, where z; is the probability of 
selection. 

In method III (pps sampling and an equal m for all units in the 
sample) the sample mean 9 is unbiased. For 


1 
Em) = 7 Bath + tee +--+ twin) 


1 N nM:Y; = 


Nia M 


In method IV (sampling with probability œ z;) an estimate that is 
always unbiased, irrespective of the values assigned to the m;, is 


2 Me 
abe x a 
For ie E 
1 GM; 1 = 
Ev) = z?( — i) =—)/M,Y;= Y 
nM a aM ms 
This estimate becomes self-weighting if we take 
kM; 
Pa = (11.5) 
since it reduces to 
= Le y 
PV HL x aaa 


In the case n = 1, the quantity k was the expected overall sampling 


fraction. Forn > 1, the expected overall sampling fraction is nk, for 
from (11.5) 


N 
Miz; x a 
= VA = a a (11.8) 


by summing the numerators and denomi 


nators of the series of equal 
fractions. But the average number of ele 


ments in the sample is 


n N N 
E (= ms) =E (z= tam.) =n Dd zm; 
i=1 i=1 


i=l 


Hence, from (11.8), 


nk = (Expected number of elements in sample)/M 
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The overall sampling fraction (ratio of number of elements in the 
sample to that in the population) should be distinguished from the 
primary unit sampling fraction n/N and from the subsampling frac- 
tions m;/M;. 

As with n = 1, this estimate suffers to some extent the same kind of 
inflation of the variance as the unbiased method II. 

There are several possible extensions of method V when n > 1. The 
estimate that seems least liable to serious bias is the weighted sample 


mean 
n 


Mii 


i=l 
Ty 2 —— 11.9 
Jv "Ms (11.9) 


i=l Ži 
The numerator is an unbiased estimate of nY, and the denominator an 
unbiased estimate of nM. If m; = kM;/z;, as in method IV, this esti- 
mate becomes the unweighted sample mean 7. 


11.6 Summary of methods for n > 1. The methods are shown in 
table 11.8. In view of the advantages of a self-weighting sample, the 
values of the m; which make the estimates self-weighting are also given. 


TABLE 11.8 Two-sTaGe SAMPLING METHODS (n > 1) 


Probabilities Estimate of Y 


in selecting Self- Bias 
Method units my General weighting status 
$ Equal m y Biased 
M iii A 
I’ Equal kNM; Zi 7 Biased 
IL Equal ENM N Tih eA Unbiased 
ïqua! NM; aM Gi nM nbiase: 
II M;/M m y Unbiased 
kM; 1 Mi _ y s 
wy “ Zi nM x ži us nkM Unhinge 
Diin 
EM: = 7 Biased 
y Zi Z M; y 
Zi 


In methods IV and V, these values are m; = kM;/z;. In methods r 
and II, proportional subsampling produces a self-weighting estimate: 
this proportionality is denoted by writing m; = kNM,. Since z; = 1/N 
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when sampling is with equal probabilities, the symbol i has the same 
meaning in table 11.8 for methods I’, Il, IV, and V. 

Where two algebraic expressions for an estimate are shown (as in 
methods I’, II, IV, and V), the first is the general form of the estimate, 
which applies for any choice of the mi: the second is the self-weighting 
form which holds only when the m; are chosen as in the preceding 
column. 


11.7 The estimation of Proportions. The estimates in table 11.8 can 
also be applied when the object is to estimate the proportion of ele- 
ments in the population which fall in some defined class C (e.g. propor- 
tion of people aged 21 and over). We need only adopt the usual de- 
vice of defining y,; as 1 if the element is in C and as 0 otherwise. 

If the denominator of the Proportion does not include all the ele- 
ments'in the population, the situation is different. Ina general survey 
covering both sexes, an example of this kind of proportion is 


Number of employed males over 14 years 
Total number of males over 14 years 


If we let Yij = 1 when the element is an employed male over 14 and 
0 otherwise, and #;; = 1 when the element is a male over 14 and 0 


otherwise, the population proportion is the ratio Y/X, so that ratio 
estimates are involved. 


We shall consider a ratio 


estimate which is a generalization of 
method V in table 11.8. Let 


= Ma. (11.10) 


The numerator and denominator are unbiased estimates of nY and 
nX, respectively (by method IV in table 11.8). If m; = kM;/z;, this 
estimate reduces to the simple form Ê = y/z. 

As will be seen, this type of estimate is very useful in presenting the 


theory both for proportions and for continuous variates. Its variance 
is given in section 11.9. 


11.8 Interim comments. In describin; 
pling and estimation, we have alread: 
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given number of primary units n in the sample, sampling with pps 
often gives the most precise estimates. The second is that, whether 
the primary units are chosen with equal or unequal probabilities, the 
biased methods of estimation based on the weighted or unweighted 
sample means per element are often more precise than the unbiased 
estimates that can be constructed from the same data. Neither of 
these results is a mathematical certainty, but experience as well as 
some theoretical investigations suggest that they hold in practice for 
many populations and many types of item. 

The contents of the remainder of this chapter are as follows: 

In section 11.9 some general variance formulas are developed which 
cover all the methods discussed here. 

Section 11.10 presents a method, due to Hansen and Hurwitz (1949), 
for determining the optimum probabilities of selection of primary 
units when field costs are taken into account. This method also de- 
termines the optimum sampling and subsampling fractions. 

Section 11.11 shows the type of relation which must hold between 
the primary unit total Y; and the primary unit size M;, in order that 
the methods of estimation based on the sample mean be more precise 
than the corresponding unbiased estimates. 

In section 11.12 general formulas are obtained for finding sample 
estimates of variance. Section 11.13 indicates how the methods are 


extended to stratified sampling. 


11.9 The principal variance formulas. In order to avoid working out 
individual formulas for each method, we may note that method V re- 
duces to method III if z; = M;/M and to method I’ if z; = 1/N (ex- 
cept that with equal probabilities we would sample with replacement). 
Similarly, method IV reduces to method II when z; = 1/N. Conse- 
quently, variance formulas for methods V and IV provide most of the 
required information. Method I, although occasionally useful, will be 
omitted in view of its liability to serious bias. j 

The work can be reduced further by considering the ratio estimate 
presented in formula (11.10). Units are selected with probability 
proportional to some estimate of size z; The estimate R is 


5. (11.10) 
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If x; is a “dummy” variate that has the value 1 for every element in 
the population, so that Z; = 1, R reduces to Jy as given in table 11.8, 
and therefore includes methods I’, III, and V as particular cases. 

The ratio estimate is also useful in its own right. If 2,; is the value 
of y;; at a previous census, we can form ratio estimates of Y and Y as 


Ye = RY: Pp = kX 


This type of estimate was found to be more precise than any of the 
preceding methods for farm items in North Carolina (L. H. Madow, 
1950; Jebe, 1952): 

In finding V(R) we shall use one of the variance formulas already 
established for the case n = 1. Asa preliminary, we require a well- 
known result for the variance of a mean in sampling with replacement. 
The result is deliberately stated in rather general terms. 

Lemma. For a specified method of sampling and estimation, the 
sample estimate wu is an unbiased estimate of some population charac- 
teristic U, with variance S,,. Suppose that such a sample is drawn n 
times, with replacement after each draw, yielding the estimates ui, 
U2, +++, Un. Then, if w is the arithmetic mean of the Ui, 


2 
V@) = ad (11.11) 
n 
Proof: 


1 
(ū —- U) = = [Qa — U) + (u = U) +--+ (un — UY] 
Hence, 


3 


V(@) = Ea- v} = ; 


2 
z 2, E(u; — U)? + SD) Elu — U) (uy — U) 
NW i=1 n“ i<j 
In sampling with replacement, the value obtained for w; is not influ- 
enced by the value obtained for uj (i = j). If we keep uj constant in 
any cross-product, and average over ùi, the average vanishes since 
E(u; — U) = 0. Also each squared term contributes 8.7. Hence 
1 87 
V@) = nS? = 
n n 
Note that the lemma does not specify how the sample is drawn: the 
drawing may be with either equal or arbitrary probabilities. 


Theorem 11.1 A sample of n units is dra 


p ; > wn with replacement, the 
probability of selection assigned to the ith unit at any draw being z;. 
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From each selected unit a subsample of size m; is drawn by simple ran- 
dom sampling. The estimate is 
a Mi: 
Li 
5 _ i=l 4% 
z Mi; 
Lī 
: i=l Ži 
Then, in large samples, * 


= rari MxM; — m) Sa? 
vÊ) =.=- 2 l (Y: = RX)? + ae m ] (11.12) 
nX” ja LZ; Zi Mi 
where 
22l F (qy, — Rey) — (Y: — BED! 
Sa = T an ij j= (Vi — BX 
d: U-la Yij Tij )} 


Proof: In the discussion of method IV for the case n = 1 (section 
11.3), we saw that M. Wi/2iM gave an unbiased estimate of Y. We 
now have n such drawings, made with replacement. Let the subscript 
i denote the ith member of the sample, where the same unit may 
appear more than once. Then the n quantities 

M: 
— j: = f; (say) 
Zi 
are each unbiased estimates of the population total Y. Let 
12 
f,=-> f: 
N i=1 


with a similar definition for X,,. Then, if & = Y/X is the population 


ratio, ps 
= Yn i eee e 
R-RE aR [I g n RA) 
Ên, x, 
As with the ratio estimate in simple random sampling (chapter 0), 
we assume, as an approximation, that the sample estimate X,, in the 
denominator can be replaced by X. This gives 


5 e bape 
R- R`=. z (Pa RX) 


12 
=— > (f: — RÊ) 
nX i=l 
al how large the sample must be. As with the ordi- 
) the approximation is probably adequate if the 
and denominator of Ê are both less than 


* The theorem does not reve 
nary ratio estimate (chapter 6 
coefficients of variation of the numerator 
0.1, though further research on this point is needed. 
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To this order of approximation, Ê is an unbiased estimate of R. By 
the lemma, applied to the variate u; = 7; — RX, 


V(R) “=. = V(¥; — RÈ) (11.13) 


But in the discussion of „method IV for n = 1, we find from formulas 
(11.7) and (11.7’) that since Ñ; = Miry 


Y; 3 MM; — m,) S 
(Pa =: (= — x) 4.5 Mle m8 
i=l Zi i=l Zi Mi 


Apply this result to the variate P; — RX; This variate equals 
M.d;/z;, where dij = yi; — Rx. Substitution in (11.18) gives 


. Y; — RX; 
V(R)*= Ea] jens ree! 
i=l Zi 
+ MAM; — mj) se) 
. i=l 2i mi; 
Since Y = RX, this may be written, 
1 i = 
VR) = elo -arp 4 5 MM m Sa 
nX? i=1 2 i=l Zi Mi 


where 


1 Mi 
S? = y & (Us — Raw) — (F: — REY}? 
e 


Corollary. If m; = kMi/zi, so that Ê becomes simply y/x, then 


et, Sule (Mi — mi) 
V(R) “ox: 2 [z (Y: = RX)? + a] (11.12’) 


As noted in section 11. 9, this result provides variance formulas for 


methods y, II, and I’ as particular cases. 
Theorem 11.2 applies to the unbiased methods IV and II. 


Theorem 11.2 With the same m 


ethod of selection as in theorem 11.1, 
the estimate of F is 


nM i 2; 
Then 


WS Ti MAM; — m) §2 
Viiv) = ares [z (Yi= 2;Y)? + Mi(M; — m) Si ] 
Ei 
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Proof: By the lemma, 


1 M ği 
van) =a C) 


ma = Eo- ane + BM] 


nM? iz Le; Zi Mi 
by (11.7). 
Corollary. If m; = kM;/z; so that frv = y/nkM, then 
1 “Py (M; — m) 
Viiv) = — [- ri- ary + ™ ge] 11.14" 
(iv) me & A iY) ; (11.14’) 


Variance formulas (11.12’) and (11.14’) are structurally rather simi- 
lar. Apart from multiplying factors, the principal difference is that in 
the ratio estimate the variate (yij — Rzx;;) replaces the variate yi; 
which appears in the unbiased estimate. The formula for the ratio 
estimate is approximate; that for the unbiased estimate is exact, pro- 
vided that sampling is with replacement. 


11.10 Optimum probabilities of selection. This analysis will be car- 
ried through for the ratio estimates. The analysis for the unbiased - 
estimates is similar on account of the similar structure of the two 
variance formulas. i 

Given that the sizes, or good estimates of them, are available, what 
probabilities of selection should be allocated to the units? This ques- 
tion has been examined by Hansen and Hurwitz (1949). Their analy- 
sis is important both for its results and as an example of the technique. 
The cost function which they consider has three components: 


cu = Fixed cost per primary unit. 
Ce = Cost per element or subunit. 
cı = Cost of listing one element in a selected unit and other costs 


that vary with the number of elements to be listed. 


The third cost factor is included because the sampler must usually list 


the elements in any selected unit, and verify their number, in order to 


draw a subsample. 
Hence, n n 
Cost = cun + ce Do me + c L Mi 
H i=l 
This formula is not adapted for our purpose, because the quantities 
E m; and >) M: are random variables which depend on the units 
that happen to be selected. Instead, we consider the average cost. 
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Units are selected with arbitrary probabilities z; and subsamples are 
chosen with m; = kM;/z;. Now 


n N N 
z(È me) =) nzm; = nk > M; = nku 
i=1 


i=l i=1 


Similarly, Pa ne 
E (= m) =} nz;M; =n Ð zM; 
i=l i=1 i=l 


Hence the average cost is 


N 
C = cun + cenkM + cm >; 2:M; (11.15) 
i=1 
In attempting to minimize the variance for fixed average cost, the 
variables at our disposal are n, k, and the probabilities z;. 
By theorem 11.1, corollary, the variance to be minimized is 
N 


5 i. 1 (M: — m;) | 
V(k) = — “R= RY ra 
(E) xe & R X)? + z% Sa 


Some changes will be made in this expression in order to simplify the 
differentiations. We also substitute Y; = M:Y;, X: = MX; Fw- 
ther, since the m; are chosen to satisfy the equations 


the quantity m;/k on the right may be replaced by M;/z;. These 
changes give 


A a ATMs ; 3 
V = X*V(#) =} [== (Y: — RX)? — < a? + = | 


i= n i i 


If we write d;; = (y:; — Rz;;), the quantity (F; — RX;) is the mean 
per element of d;; in the ith unit, and may be denoted by D,. Combin- 


ing the first two terms inside the square bracket, this gives 
N 1 M? = Sa 2 M; 
V=% - [== (D? = =) = | 
x nl z; M; T k Sai 


Finally, it will be noted that n appears onl 
and nk. We introduce the variables 2/ =n 
is no longer an explicit function of n. Thus 


SM? [ra SaN M; 
F= [== (D? - P) mtg 2] 
z zi u,) ph (11.16) 


y in the combinations Nzi 
2; and k’ = nk, so that V 
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We are to minimize V with respect to variations in n, k’, and the z,’, 
subject to the restrictions that cost is fixed and that 


N N 
Lask: ieJ z =n 
i=l i=1 


Taking à and y as undetermined multipliers, we minimize 


N N 
V + Alcun + cek’M + er >> zM) + u(n — z’) 
Differentiation gives sats ie 
n: Acu +n = 0 (11.17) 
B; M? (pe Sai? 

ait a ae a) tie 8 (11.18) 
ie. 

= Sai 

M2 (Be = =a) 
M; 


(11.19) 


A(Cu + aMi) 


Since A is the same for all z, these equations provide explicit solutions 
for the z; and hence the z; = 2;//n. 

The numerators of the 2/2 in (11.19) will be assumed positive. (If 
they are negative it is found that subsampling is inefficient, single- 
stage sampling of the primary units being superior.) For the optimum 
z; we have 


M;Diu 
Veu + aMi 
= A aia (11.20) 
> M;Diu 
ia V Cu + aM; 
where 

ee 
iu M; 


The quantity Diu? must now be examined, since it may depend on 
the size of unit M;. A quantity resembling Diu? has been encountered 
previously (section 9.4) under a different notation. Consider a popu- 


TABLE 11.9 ANALYSIS OF VARIANCE FOR THE POPULATION 
df ms 
N 
Sè = MY) DNN - 1) 


i=l 


N(M — 1) S = Sat 


Between primary units (N — 1) 
Between elements within primary 
units 
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lation in which all units are of the same size M. Perform an analysis of 
variance similar to table 9.5 (p. 196), on the variates dij = yz — Ravi. 

From the definition of D;,”, its average value over all units (assum- 
ing M; = M) is 


Fe gen 
Di“ = 


alr 
iM: 
=I 


Hence, if N is large, the average value of Diu? may be written, from 


table 11.9, 5 ” 
ma Or = S. 
Di” = a (11.21) 


In section 9.5 we studied how the functions S;? and S,,? depend upon 
M. We found, as empirical formulas, 


Sy? = AM: (g > 0) 
Sy? = Ms? — (M — 1)A Me 


where S? is the variance among all elements in the population. By 
equation (11.21), this gives 


Da? = 8? — AM: (11.22) 


If M does not vary greatly, the assumption that Sw? is constant, 
ie. g = 0 and hence Dia? = constant, is often satisfactory, as was 
suggested in section 9.5. Otherwise, equation (11.22) shows that 
Dj? may be expected to decrease as M increases. From their expe- 
rience, Hansen and Hurwitz (1943) suggest that D,,2 will seldom, in 
practice, decrease as fast as 1/M. 

We are now in a position to discuss the optimum choice of the z;. 
From (11.20) MDin 


V cu + aMi; 


The following deductions may be drawn from this result: 

i. Suppose that cıiM;, the cost of listing and related operations per 
primary unit, is small relative to cy, the fixed cost per primary unit. 
If Diu is constant, then selection with pps is optimum. If Dio de- 
creases with increasing M; optimum probabilities lie between 20 M; 
and zix V Mi 

ii. If the cost of listing predominates, optimum probabilities lie 
between z; « V M; and z; = constant (equal probabilities), 


Zi X 
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iii. If costs of listing and fixed costs are of the same order of magni- 
tude, z; © V M; is a good compromise. 

The optimum k, found by differentiation, is left as an exercise to 
the reader. Its value is 


VD M:S? 


WD; 
o> iv ere aM; 


The optimum n is found by solving the cost equation (11.15) for n. 
As has been mentioned, the discussion in this section assumes that 
the sizes, or good estimates of them, are known. No part of the budget 
is allotted to obtaining information about the sizes of the N units, 
except for any listing needed in the units that comprise the sample. 


k= 


11.11 Biased versus unbiased estimates. In this section we examine 
the conditions under which the biased estimates based on the sample 
mean have a smaller variance than the corresponding unbiased meth- 
ods. Assuming an arbitrary probability of selection z; for the primary 
units, the comparison is between method V and method IV in table 
11.8. The estimates are: 


M; 
ES 
m= = 8 (if m; = kM;/2;) 
Zi 
y E 
gy = — =: Pa =—— (if m; = kM;/z;) 


nkM 


Structurally, Jy is analogous to a ratio estimate y’/z’, where y; = 
Miği/zi and x = M;/z;, while fry = 9//M is analogous to a “mean 
per unit” estimate. As with an ordinary ratio estimate, we SHES 
therefore anticipate that the size of the correlation between y;’ and z; 
will decide whether Jy is the more precise. 

For simplicity, we assume m; = kM;/z; with both methods. From 
theorem 11.1, corollary, we have 

N 


4 ji 1 
V(R) =o 2 [- (Y: — RX)? + 


(M; — mi) 
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The variance V (Gy) is found as a particular case of this result, by put- 
ting z; = 1. This gives 


X=M: X% =M: B= oe 
= tote 5 X M 
1 Mi 
Sa? = Ee (u; — Y)? 


Hence 


p & [> = (M; — m;) 
Vy) = — (Y; — M,Y)? + ———* 52 
(Gv) ae x 7 ( ) F 
For V (Gry) we have, from theorem 11.2, corollary, 
t & Ti (M: — m;) 
-bfiort op] 
Vw) = a R ae? 


i 
i=1 
The within-unit contributions are the same in the two estimates. 


Hence Jy has the smaller variance if 


1 = 1 
Pg (Y: — MY) < Y= (Y; — zY}? 
i zi 


This may be written 


Ye MN? Y; 2 
Èz (= a ) < h (= = r) (11.23) 
i i Zi 


Now let Y? = Y;/z, X/ = M,/z; 


be variates defined for every 
primary unit. Since units are selected 


with probabilities Zi, we have 


Y; 
BY = Saye eF 
z 


M; 
EX; = bpr es =M 
Zi 


Hence Y = Y/M is the population ratio R’ of YY to X/. 

It follows that the right side of (11.23) is the population variance of 
Y;,’, whereas the left side is the population variance of the variate 
(Y! — R'X;). 
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By expanding the variance of (Y/ — R'X;') we find, as with the 
ordinary ratio estimate (section 6.7), that V Gy) is smaller than V (ftv) 
if . 
lev of M;/zi 


Sa a A 
P Z Dev of Y;/zi 


where p is the correlation coefficient between Y;/z; and M. i/zi- 
When primary units are chosen with equal probability, this result 


reduces to the simpler condition 


Lev of M; (011.24) 


na 
PYiMi “9 ey of Yi 


This condition shows that the relative precisions of the biased and 
unbiased estimates depend on the size of the correlation between the 


- and the primary unit sizes Mi. The compari- 


primary unit totals Y; 
son is, however, & large-sample one, in which the bias in the method V 


estimates has been ignored. 


As we have seen, most of the estimates 
e ratio estimate in section 11.9 or of the 
IV, for which the true variances were 
d 11.2. Sample estimates of these vari- 


11.12 Estimated variances. 
are particular cases either of th 
unbiased estimate in method 
presented in theorems 11.1 an 
ances will now be given. 


Theorem 11.3 A sample of n primary units is drawn (with replace- 
ment) with probabilities 2;. From each selected unit a subsample is 
drawn by simple random sampling. The estimate is 


An unbiased estimate of the approximate variance of Ê is 


x 1 fM 2 
(8) = aed i (yi — Ra] (11.25) 


where R is the population ratio Y/X. 

Proof: The primary unit sample totals for the n successive drawings 
are denoted by Y1, Y2 `` Ya Note that, although the same unit may 
appear more than once, we give it a separate subscript each time it 


appears. 


` 
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dij = Yi; — Ray; 


Then 
M: M; 
: — Rr) = — d; = 
i (y: — Raj) g 
A ME D: M; . 
E Mig on =— + (a; — D) 
Zi Zi Zi Zi 
where 


D: = Y;— RX; and D; = M;D; = Y; — RX; 
Now square and average over all subsamples from the ith unit. 
Mid? (Dd? MAM; — mi) Sa? 
(E-G 
i \zm; Zi i! Mi 
Hence, for a fixed selection of n units, 


B £ (=== A [= Ep 3 MAM; — m) at 


ia \ZMi ix a z? Mi 
S, [DNP GMM; — mi) Sa? 

Eae 4 Se SMM mi Sa? 

i=1 Zi i=1 Zi Mi 


where t; is the number of times that 
any specific sample. When we av 
E(t;) = nz; Hence 


= (Mid;\? [D2 | MM; — m) 812 
aE (Tayo E [Bs Maes maa 


i=1 MZM; i=1 Ži Mi 


the dth unit has been drawn in 
erage over all possible sets of n units, 


But by theorem 11.1, formula (11.12), since D; = Y; — RX; 
De ilM: — mj) Sa? 
aa mar 


1 
nX? i=l Zi Mi 
The result follows [note the divisor n2X2 in v(R)). 

Corollary. Theorem 11.3 is not usable as it stands, since R, and in 
some applications X also, are unknown. In place of R we put the 
sample estimate Ê and replace an n in the denominator by (n — 1), as 


was done with the ordinary ratio estimate in chapter 6. An unbiased 
estimate of X is 


V(R) ‘=. 
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Hence we take 
(Ê) = 
n 


(n— 1X? ia 

Example. Compute the standard error of the estimated mean age 
by method I’ in table 11.7 (p. 244). In this sample, 20 pages were 
selected with equal probabilities, with m; = 2. 


The estimate is 


Mia: 17,121.5 
Jı È Mg = = 47.6922 years 


Ir = SSM; 359 


Some extra decimal places are retained to ens 


m 2 
(y: — Rz;) } (11.26) 


zM; 


ure accuracy in later 


computations. 
To apply formula (11.26), we put 
1 á N 
m =2: R= Tr: z; = m; = 2: g=- M: 
n 


zi =—: 


N 
Substitution into formula (11.26) gives 
> 
Ê) = —— era 2 M: — Jr)? (11.27) 
O- GDM A 
The sum of squares is easily computed from table 11.7 
form 


E Mad? -A E (Mg) + Hh? LM? 
= 15,375,020 — (95.3844) (309,747.5) + (2274.55) (6481) 


, p. 244, in the 


= 571,300 (11.28) 
(20) (571,300) _ 


(19) (359)? 


s(R) = 2.16 years 
to be identital [apart from the fpe 


(N — n)/N] with formula (6.14), p. 119, which was used to compute 
the variance of a ratio estimate in single-stage sampling. Hence we 
could have used formula (6.14) here by putting y: = Mi a’ = M, 
and calculating the variance for the ratio Jo y/2 xi. This is a 


particular case of a general result which was also noted in chapter 10. 
If n/N is negligible in two-stage sampling, estimated variances can be 
found by the appropriate formulas for single-stage sampling. This is 
fortunate, for, despite the relative complexity of two-stage sampling, 


the formulas for estimated variances remain simple. 


o(R) = 4.67 


Formula: (11.27) can be seen 
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The following result gives the sample estimate of variance for the 
unbiased estimate in method IV. 


Theorem 11.4 With the same method of sampling as in theorem 
11.3, the estimate of Y is 
1 2My; 
iv = — > — 
nM ji Zi 


Then an unbiased sample estimate of V (ry) is 


T = SERN — 3 Soe 2 
“Ghv) = — a 2 @:-F) (11.29) 
where 


M ji = 

Pya ti, fa =- 5 P; = My 
Zi Nin 

The proof is obtained b 


y the same approach as in theorem 11.3. 
Estimates of variance 


for all the methods discussed in this chapter 
theorems. They are shown in table 11.10 for 


TABLE 11.10 SAMPLE VARIANCES FOR ESTIMATES OF THE POPULATION MEAN 
Unit Estimate Sample estimate of variance 
Method probability mi of f (for a self-weighting sample) 
ti Eq. ENM; 7 E mn- 
(n = 1)m? 
u Eq ENM; ae Shen? 
nkM n(n — 1)(kM)? ` 
M; 1 
I al m CER aS -— 92 
Ill i m 7 nn — Dae = (Yi — g) 
kM; y 1 
IV a — — ee pagi 
i z: akM na — DEI & (vs = 0) 
kM; n i 
v Zi = 7 hopr È (m:i: — 9}? 


the self-weighting forms of the estimate. If the self-weighting condi- 
tions do not apply, the reader should obtain his sample estimates of 
variance directly from theorems 11.3 and 11.4. 

The formulas in table 11.10 assume sam 
ignore the fpc which applies in methods il 
without replacement. The formulas are 
is small. 


pling with replacement, and 
and II when units are chosen 
adequate provided that n/N 


11.13 Extension to stratified 


sampling. For the unbiased methods 
II, III, and IV, the extension t 


o stratified sampling is straightforward. 
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The subscript h denotes the stratum: Ma is the total number of ele- 
ments in stratum h, and M is the total number of elements in the 
population. The estimated population mean is 


L 

Fa =D Wan Wa = Ma/M (11.30) 
h=1 

where 3, denotes the estimate of the stratum mean per element i, 

Further, 

V) =o Wi? VG) (11.31) 

i= 
and the estimated variance is 


L 
Wa) = LL Woh) (11.32) 
h=1 


Table 11.10, which gives the value of v(Ẹa) for a single stratum, is 
useful in constructing these estimates of variance. 

Table 11.11, which is an extension of part of table 11.8 to stratified 
sampling, presents the algebraic forms of the three unbiased estimates. 
The column headed ma: shows the subsample sizes which make the 
estimates self-weighting within strata. The self-weighting forms of the 
estimates are given in the right-hand column: the general forms apply 
if the mp; have not been chosen so as to make the estimate self-weight- 
ing within strata. In the summations within the table, h goes from 1 


to L and ¢ from 1 to mp. 


TABLE 11.11 UNBIASED ESTIMATES IN STRATIFIED TWO-STAGE SAMPLING 


Unit Sample estimate of 4 
probability nn 
within Self-weight- 
Method strata Mhi General form ing form 
1 Ni 1 1 
r r SSE Mite ae 
II Eq. InNaMns E x a » hitni 7 x an 
Mas P lym 
III A Dh M+ main? 
kxMns Tota Mi Bg te 
aN PE Zhi M Zar ae MF man “ 


The right-hand column of table 11.11 also shows the conditions 
under which the estimates become completely self-weighting. It will 
be recalled that naka is the overall sampling fraction within stratum h. 


Hence, if the overall sampling fraction is kept constant in all strata, 
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estimates II and IV are completely self-weighting. The same result 
holds for sampling with pps in method III, for with this method the 
number of elements m+ chosen per primary unit is constant within a 
stratum, so that n,m,/M), is again the overall sampling fraction in 
stratum h. 

With the biased estimates I’ and V, and the ratio estimate, we could 
use a weighted mean of the estimates for the individual strata, similar 
to formula (11.30). But since all three estimates are essentially ratio 
estimates, the biases may have the same sign in all strata, and if there 
are numerous strata, with small samples from each Stratum, the bias 
in the weighted mean may be substantial. This kind of estimate is to 
be recommended only if there are few strata, with large samples in 
each stratum, or if we have evidence that the overall bias will still be 
negligible. 


The alternative is to make a combined ratio estimate, which will be 
illustrated for the most general case. Let 


1 2 Mpy 
f, =— — tas 
Nh i=l Zhi 
with a corresponding definition for X;. The quantities Y,, Ẹ, are 


unbiased sample estimates of the stratum totals Yni XK, respectively. 
The combined ratio estimate is defined as 


The approximate variance of R, is found in the usual way by writing 


1 L 
Ê, — R'=, ¥ 2 (Pr — RR,) (11.33) 


The quantity (P, — RX) is an unbiased m 
stratum total (Y, — RX,) of the variate d), 
V(Y, — RÊ) is found as a particular cas 
leads to the result: 


ethod IV estimate of the 
ij = Yhij — Rep:;. Hence 


i Ly mr LENG ema ARE 
VRJ =. y— y= anrang 


1 
hal h i=1 LZhi Žhi Marg 


(11.34) 
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where 
Dri = Vas — RX mi 
Bind og 
= —— sj — Rin) — (Yar — RXn)}? 
dh Mn) A {Yass hij) — (Yas nid} 


In order to deduce the variances for methods V and I’, we substi- 
tute xaj = 1 in formula (11.34). This gives 
X=M: R=¥ 
Dri = Yn — YM 


ye 
2 fas as ene 
Mn- E (Yhij hi) 

The resulting variance formula, like (11.34), is a ratio-type approxi- 
mation and assumes that ,/Na is small. Another particular case is: 

Method III. 

Probability proportional to size within strata. zni = Mn;/My. 

Estimate: same as in table 11.11 if maz = ma. 

The variance formula can be deduced from either (11.31) or (11.34). 


1 4M, ™ = we x6 Mhi — Ks). ia 
V(Gu16) = — © — [ata z 72+ e s,2| 
M* pa nh i= Mhi 


2 2 
Sahi” = Shi 


The choice of the optimum sampling ratios n/N» for units within 
strata can be attacked by the techniques given in chapter 5, but will 


not be discussed here. ` 
For the estimated variance of Re, we have, from equation (11.33), 


L 


a 1 
(Re) = z E of, — RÊ) 


h=1 
By an application of theorem 11.4 to the variate dj,;; in the individual 
strata, we find 


a Lint 1 7 
(Ê) == S (dai! — Ga)? (11.35) 
P host ACOA E= 1) i=l 
where Ta i 
Mnridhi 4 = 
dh! = HN, di = — d 
Zhi Nh i=l 


dns = Jri — Êeni 
If X is not known, the sample estimate > Ê, is substituted for it. 


11.14 Summary comments. As the discussion in this chapter indi- 
cates, the efficient design of a two-stage sample with units of unequal 
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size requires a good deal of preliminary work. This section recapitu- 
lates briefly the principal issues. 

1. Find out whether the sizes are known, known approximately, or 
unknown. In the last case, consider whether some information about 
sizes can be obtained relatively easily. For example, Jessen et al. 
(1947) conducted two-stage samples of blocks in some Greek towns in 
which no usable estimates of the numbers of households per block were 
available. They considered three approaches: (i) Drawing the blocks 
with equal probabilities, (ii) Making a rapid tour of the town by jeep 
in order to tie together small blocks so as to build artificial blocks that 
appeared to have roughly the same numbers of households, Blocks 
which obviously had no households were eliminated in this process. 
The sample blocks were then chosen with equal probability. (iii) 


with probability proportional to estimated sizes. 

2. Consider whether to use size of unit as one of the variables for 
stratification: this is advisable unless it prevents the use of some other 
variable that might give a worth-while increase in precision, 

3. Decide how the units are to be selected within Strata. If sizes 
are known at least approximately, selection with PPS, or its square 
root, will often be the best procedure, although this depends on the 
nature of the field costs, 

4. Select a method of estimation. For estimating the population 
mean or total, a ratio estimate using the value of the same item at a 
recent census is sometimes very successful, if available, Estimates 
based on the sample mean or weighted sample mean are often more 
precise than the unbiased estimates. 

5. Decide on the sampling and subsampling fractions within strata. 
We have recommended that, subsampling fractions be cl 
the estimates are self-weighting within strai 


g or 1 S for all possible samples which can be 
drawn from the artificia] population in table 11.1, by methods Ta, Tb, TT, and 
ITI, verify the total variances given in table 11.2. 3 
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11.2 A population contains 2 primary units, with 6 and 4 elements, re- 
spectively. The values of y;; are 0, 0, 1, 3, 4, 4 in the first unit, and 0, 0, 2, 2 
in the second. One primary unit is chosen, and the expected sample size is 
to be 3 elements. Work out the contributions to the total variance of thé 
estimated mean per element in methods Ia, II, III, IV, and V. In the last 
two cases, the z; are 0.55 and 0.45, respectively, and m; = kM;/z;. 

11.3 The elements in a population with 3 primary units are classified into 
2 classes. The unit sizes M; and the proportions P; of elements which belong 
to the first class are as follows: 


M, = 100, Mz = 200, M; = 300: Pi = 0.40, Pa = 0.45, Ps = 0.35 


For a sample consisting of 50 elements from 1 primary unit, compare the 
variances of methods Ia, II, and III for estimating the proportion of elements 
in the first class in the population. (In the variance formulas in section 11.2, 
S: is approximately P:Q;.) 

11.4 A sample of n primary units is chosen with pps, and m elements are 
sampled from each unit in the sample. Deduce the formula for V(r) from 
both theorems 11.1 and 11.2. Is the variance formula exact? 

11.5 A sample of n primary units is chosen with equal probabilities and 
without replacement. The unbiased estimate Yr of F is used. Show that 


X an — Y) = => Mi - F) + (fa - Y) 


where F, is the true mean per primary unit for the n units in the sample. 
Hence find the exact variance of Fr and compare it with the variance deduced 
from theorem 11.2, which assumes sampling with replacement. 

11.6 For the data in table 11.7, estimate from the sample the standard 
error of the unbiased estimate which was made of the mean age of entries in 
American men of science. (M may be taken as 50,000.) 
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CHAPTER 12 
DOUBLE SAMPLING 


12.1 Description of the technique. As we have seen, a number of 
sampling techniques depend upon the possession of advance informa- 
tion about an auxiliary variate Ti Ratio and regression estimates 
require a knowledge of the population mean X. If it is desired to 
stratify the population according to the values of the Ti, their fre- 
quency distribution must be known. 

When such information is lacking, it is sometimes relatively cheap 
to take a large preliminary sample in which zi alone is measured. The 
purpose of this sample is to furnish a good estimate of X or of the fre- 
quency distribution of z;. In a survey whose function is to make esti- 
mates for some other variate yi, it may pay to devote part of the re- 
sources to this preliminary sample, although this means that the size 
of the sample in the main survey on y; must be decreased. This tech- 
nique is known as double sampling or two-phase sampling. As the dis- 
cussion implies, the technique is profitable only if the gain in preci- 
sion from ratio or regression estimates or Stratification more than off- 
sets the loss in precision due to the reduction t the size of the main 
sample. 

Double sampling may be very appropriate when the information 
about x; is on file cards that have not been tabulated. For instance, in 
surveys of the German civilian population in 1945, the sample from 
any town was usually drawn from rationing registration lists. Tn ad- 
dition to geographic stratification within the town, for which data 
were usually already available, stratification b 
posed. Since the sample had to be drawn in 
lists were in constant use, tabulation of the complete age and sex dis- 
tribution was not feasible. A moderately large systematic sample 
could, however, be drawn quickly. Each person drawn was classified 


into the appropriate age-sex class. From these data the much smaller 
list of persons to be interviewed was selected. 


y sex and age was pro- 
a hurry, and since the 


12.2 Double sampling for Stratification. 


The theory was first given 
by Neyman (1938). 


268 


12.2 DOUBLE SAMPLING FOR STRATIFICATION 269 


The population is to be stratified into a number of classes according 
to the values of x; The first sample is a simple random sample of 


size n’. Let: 
W, = Na/N = proportion of population falling into stratum h. 
wa = ny’ /n! = proportion of first sample falling into stratum h. 
Then wn is an estimate of Wn. 


The second sample is a stratified random sample of size n in which 
yi is measured: na units are drawn from stratum h. The second sam- 
ple is often a subsample from the first sample, but it may be drawn 


independently if this is more convenient. 
The cost of the two samples is assumed to be of the form 


O = nen + n'ew (12.1) 


where cp is usually large relative to cn’. 

The problem is to choose n’ and the n, (and consequently 7) so as to 
minimize the variance of the estimate for a given cost. We must then 
verify whether the minimum variance is smaller than can be attained 


by a single simple random sample in which y; alone is measured. 


The first step is to set up the estimate and determine its variance. 


The population mean is 


L 
Y= > Wafa 
h=1 
As an estimate we use 5 
Js = D wid 
k=l 


sample’ is drawn, this implies a fresh drawing of 


both the first and the second samples. Thus the w, and the sample 
means ğa are both random variables, subject to error. The problem is 
therefore one of stratification in which the strata totals are not known 
exactly. The strata boundaries are assumed fixed in repeated sam- 


pling. 
Theorem 12.1 The estimate Gs 
Proof: Write 


Whenever a new 


, is unbiased. 


wn = Wat ur: Yh = Yat en 


Then the error of estimate may be expressed as 
Ju —Y = Dd (waga — Wiln) 
h 


= > (Wren + Yrun + Unen) (12.2) 
h 
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By the properties of simple random sampling, the quantities u, and 
er all have means zero. Further, by the method of drawing, wu, and er 
are independently distributed. Hence 
Ea — Y) =0 
In the theorem for V(g,,), the sampling ratios n’/N, n/N), are as- 
sumed negligible, since these assumptions seem valid in the great ma- 


jority of applications. The variance of Yn: in stratum h is denoted as 
usual by S;,2. 


Theorem 12.2 If n'/N and n/N; are negligible, 


D Wall — Wi)) S WAY, — P? 
VG) = 5 [ {me + “el ie oe] (12.3) 
h=1 n Nha n 


Proof: From (12.2), 
Vga) = Elga — Y)? = E [= (Wren + Frun + wer) | (12.3) 
h 
Consider first the squared terms. These are: 


E {= (Wren + Yrur + wen?) 
h 
= x (Wi? E (en?) + Y,2B(uy2) + E(un?)E(e,2)] (12.4) 


since all other terms in (12.4) vanish when the expectation is taken. 

Since n’/N is assumed negligible, the variates wn follow a multi- 
nomial distribution: hence E(w”) = wW a(l — Wi). Thus the squared 
terms contribute 


WS? YW, — Wa) Wia Wa) 8,2 
h 
5 [AES 4 M, Walt — Wy G “| (12.5) 
h Nh n n nha 
Now, consider the cross-product terms between 
equation (12.3’). If h and j refer t 
tion from terms of the form nj, since sampling is independent in 
different strata. The only non-zero contribution is that from terms in 
Y,Yjunu;. For the multinomial distribution 
WW; 


, 


different strata in 
o two strata, there is no contribu- 


E(uyuj;) = — 
n 
so that cross-product terms contribute 


YiY;WiW; 
=DD ee” 
eee (12.6) 
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If the middle term in (12.5) is combined with (12.6), the reader may 
verify that these together amount to 


2 


h n 


Wn — ¥)? 


F 


Hence, 


V@Gst) = = 


h 


, 


[ {mie £ Wall — Wa)] Si mi WilYn — 2al 


n’ Nh n 


The term free from n’ is the familiar expression for the variance 
when the strata sizes are known exactly. The effects of errors in the 
first sample are therefore to increase slightly the within-straum con- 
tribution to the variance, and to introduce a between-stratum com- 


ponent. 
Corollary. If we are estimating a proportion in the second sample, 


then 
2 _ Mn) 


a 0 


PiQn `=. PQ 
(¥,— Y= (Pi — P)? 


and theorem 12.2 gives 


» Wa — Wad] Pan | WaPa — B 
[{m F + 


V (pst) `=. 2 7 


h 


n! Nh n 
(12.7) 
where P, is the proportion in stratum h. 


The values of the n, and n’ that lead to 


12.3 Opti allocation. 
ovo ther complicated. It is clear from for- 


the minimum variance are ral i 
mula (12.3) that nn should be proportional to 


2 Wall — Wa) 
Sia ine ste a ae 
rm inside the root is usually small compared with 


Si t k 
iao the gevond 4a s taking n» proportional to WW4Sa- 


the first, Neyman (1938) suggest: 
T 
tiia nWiSh 
a = E Was) 
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When these values are substituted into the variance (12.3), with the 
term in Wa(1 — Wa) ignored, we obtain 


= V2 
Vous =. (2 WS)" +È Hah = (12.8) 
n n 
me IF = (say) (12.8’) 
n n 


This approximate expression for the variance is now minimized by 
choice of n and n’ for a given cost of the form stated previously 


C= nen + Wew (12.1) 
It is easily found that 
= 2 12.9) 
V Vata VV nila a2; 


This equation and (12.1) determine n and n’, 
An expression for the minimum variance will be needed for later 
applications of double sampling. From (12.9), 


n n Nn + N'n 


VV nen 3 VV nren E Ventas (WV ncn + VV aen! ) 


Cc 
= — A A a K 4 
V erty! (WV nen +V Varen) ei 


Substitute these solutions in equation (12.8’) for Vopt. This gives 


GART ty Varen y 
Voot = LoL. (12.10) 


Example. This example is a 
involved. We use the Jeffers 


as for farm size (x;), and let the cost be of the form 


C= 100 =n + 017 (12.11) 


This means that, if double sampling is not used (n’ = 0), we can 
afford to take a sample of 100 farms to estimate corn acres. : 
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The relevant data for the population are: 


Strata Wi Sè Sh Yn 
1 0.786 312 17:7 19.404 
2 0.214 922 30.4 51.626 
Population 620 26.297 


By formula (12.10) we could proceed at once to compute Vopt- 
However, the intermediate steps will be given. We find 


Vn = (X Wash)? = 417 
Vw = D Wil¥n — YY = 175 


so that, by formula (12.9), 


n Air L 
m T 0.488 
n 175 10 
From the cost equation (12.11) we obtain 
100 
n! =—— = 170; n= 170 X 0.488 = 83 
0.588 


At this point the reader may verify from the data in this example 
that the neglected term in W,(1 — Wh) in the variance formula (12.3) 
is in fact negligible. From formula (12.8) we then have 


Vapi = 44 + 44% = 5.02 + 1.03 = 6.05 


For a random sample of size 100, with no double sampling, we would 


have 
V = $35 = 6.20 


It appears that there would be only a trifling gain from double sam- 


pling. 
cation. As- 


12.4 Estimated variance in double sampling for stratifi 
from (12.3), 


suming n//N, na/Nn negligible, the true variance of st is, 
L Wid — my 82 Waa = 
a = 2 N ee F a pii a ee 
Y (ga) = 2 [ {re aK rT a + a 
ple are substituted in this quantity, the 
t to be an overestimate of V (Ta). An 
tructed without diffculty. 


Tf estimates from the sam 
resulting expression turns ou 
unbiased estimate can be cons 
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Theorem 12.38 An unbiased estimate of V (Ẹ,t) is 


n 2 Wh) S? Wr — Š 
m eS 12.12 
st) (w — 1) x i n' J Nh jà n' ( ) 


Proof: This is obtained by substituting w, = W, + üh Fr = Yat 
en, in (12.12). The expectations of the successive terms inside the 
main bracket work out as follows: 


P Wall — W,)) S2 
E pa => [m na | as (12.13) 
7 


Nh h n A 
2 2 
Wh Sh Wy Si 
“EX = - pS (12.14) 
hn Ah h N Nh 


walin — Gat)? wide Jè 
ae eit LD a ima 
A n h n n 


p = Wil? > Wis Y _ V) 


PA . : 
h N Nh n n 
(12.15) 


Adding (12.13), (12.14), and (12.15), we obtain for (m= L)Ev(gs)/n 


by [ {me 4 Wad S WalPa = = e) 


h n J Nh n 


n 


= VDV) 


n 
‘The theorem follows. 


If n’ is large relative to the nn, Viat) reduces to 


we 0 Sh” 
Gs) "=. D w2— (12.16) 

h mH 
This expression is equivalent to 

weights w» can be ignored. 

Corollary. If p;, is the observed proportion of units in stratum h 
which fall into some defined class, and py = Dy wapn/ > wi is the 
estimate of the population proportion, then an estimate of V (pet) is 


assuming that errors in the strata 


Hou) = —— E (wnt - = eee ga 


(n — 1) 5 WT ry, — 1 n’ 
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In almost all cases, this can be simplified to 


dad" E25: 4 Wr(PA — =| 
A Lap = 1 nt 
Frequently the term in 1/n’ can also be dropped. 
Example. In a simple random sample of 374 households 292 were 
occupied by white families and 82 by non-white families. A sub- 
sample of about 1 in 4 households gave the following data as to owner- 


ship: 
Owned Rented Total 


White: 31 43 74 
Non-white: 4 14 18 


Estimate the proportion of rented households in the area from which 
the sample was drawn, and find the standard error of the estimate. 
If the first stratum consists of the white-occupied households, 


w, = $92 = 0.78: we = Pr = 0.22 
pi = 44 = 0.60: p= 4 = 0.78 
Dat = WiP1 + Wep2 = 0.64 
n! = 374, m =74, m= 18 
It is readily found that only the leading term in v(pst) is of impor- 
tance. Hence 
iad = 2 nimt a orem T eT 
= 0.00248 


8(pst) = 0.049 


ortion of rented households is 0.64 + 0.049. The 
nly a trifling gain in precision over & 
le of size 92. In view of the rela- 
a greater difference between 


ted households for whites and non-whites would 


double sampling profitable. 


The estimated prop 
reader may verify that there is 0 
single-stage simple random samp! 
tively small size of the non-white stratum, 
the proportions of reni 
be necessary to make 
a number of the applications of 
en used to make a re- 
he population is in- 


12.5 Regression estimates. In 
double sampling, the auxiliary variate x; has be 
gression estimate of Y. We shall assume that t 
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finite and that the relation between y; and z; is linear. Write as a 
=e Yia = Y + B(x; — X) + eja (12.17) 


where the second subscript œ is introduced as a reminder that for 
fixed x; the random variate e,. follows a frequency distribution with 
mean 0 and variance S,? = S,?(1 — p°). 

In the first (large) sample, of size n’, we measure only 2;: in _the 
second, of size n, we measure both T; and Yia. The estimate of F is 


dr = 9 +o — 2) 


where 7’, Z are the means of z; in the first and second samples, respec- 
tively, and b is the least Squares regression coefficient of 1 
computed from the second sample. 

We now examine the error of estimate (ğı 
we find 


ia OD Ti 
— F). From (12.17) 
g=Y+BE-X)+e (12.18) 


ps Wia — 9) (z: — 2) 
p=- 5 


pe (a — 2)? 


i=l 


= Cia(%; — 2) 


= (12.19) 
È (zi — @)? 


i=1 


=B+ 


From (12.18) and (12.19), 


substitute for g and b in the error of esti- 
mate. This gives 


r- Y = G- F) +o -7 
= B(ž — X) +ë + Bw — 3) + (# — 4) E eial es ~ ia(ts — 2) 


a SL 
È cialti — 3) i 


Em- + B@ — X) (12.20) 


In ordinary regression theory, 
tice is to discuss the conditional 
estimate (Jı — Y) in repeated 
fixed. If this approach is adop 
the x; values fixed in both the 


=i+@-a 


in which 7’ = X, the standard prac- 
frequency distribution of the error of 
samples in which the z; 
ted in the present proble 
first and the second sam; 


values are 
m, keeping 
ples, we see 
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that the estimate is biased in the conditional distribution, since 


E r — F) = B@ — X) 


If the bias is not too large, we have seen (section 1.5) that it may be 
taken into account by adding its square to the variance of ğı. Hence 
we may regard the conditional variance Ve of Fir as 

œ -3° 


i 
Velu) = SP- P) [- + eed + Bz’ — X)? (12.21) 


This expression is not suitable for comparison with other methods 
of sampling, since the variance depends on the set of x; which appears 
in the two samples. Instead, we need the average variance over all 
possible drawings of the first and second samples. 

A simple result for the average variance is obtained under the as- 
sumption that (i) the first sample is drawn at random, (ii) the second 
sample is a random subsample drawn from the first, and (iii) the z; are 
normally distributed. In this event the average variance is found to be: 


ny 80-0) [2+ (2-3) BS? 
V(r) = SA |+ 7 )aslt aT Ae 


oy =: at —_ 20 2 
_ sr A, won} 28 ae 
en! (n-8) n’ 


n 
since B2S,2 = p8p. 

If the z; are not normally 
changed is that in 1/(n — 8), as 


distributed, the only term whose value is 
discussed in section 7.3. As regards 
assumption ii, the small sample might not be drawn at random from 
the large sample: it is preferable to select the small sample so as to 
obtain a wide spread in the values of x; and hence reduce the sampling 
error of b. The effect is to reduce, perhaps considerably, the term in 
1/(n — 8). 

In some applications t 
the first. In this event the argumen 
unchanged down to equation (12.21). 


ow 


is replaced by (: i ) 


he second sample is drawn independently of 
t given in this section remains 
In equation (12.22) the term 


This case of two independent samples was first considered by Chameli 


Bose (1943). 
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To summarize, there is some doubt about the exact value of the 
term in 1/(n — 3) in the average variance. However, if 1/n is negli- 
gible, this term is also negligible. This gives the following theorem. 


Theorem 12.4 If the first sample is of size n’ and the second is of 
size n, and if 1/n is negligible, the variance of Tir, the regression esti- 
mate in double sampling, is given approximately by 

sea- p*) gs2 
V@r) =. = 4. 


(12.24) 
n 


12.6 Double sampling with regression versus single sampling. From 
the variance formula (12.24), double sampling with a regression esti- 
mate can be compared with a single simple random sample, under the 
assumption that (i) the first sample is a simple random sample, (ii) 
1/n is negligible, and (iii) the second sample is also a simple random 
sample. Results for this case should provide a rough guide to other 


cases, 
Write 
y, Vy 
Vr) = + 
n n 
where 


Vn = SP a P): V,' = PSr 
Cost = C = nen + n'en 
The problem of finding the optimum n and n’ and the minimum 


variance is exactly the same as in double sampling for stratification 
(section 12.3). Equation (12.10) gives 


v (Y Vaen + VV nren )? 
opt = — 


Cc 


— SIV (1 = p?)en + pV on P 
T C (12,25) 
where p is taken as positive, 

Tf all resources are devoted to a 


for regression, this sample is of size 
mean is 


single sample, with no adjustment 
Ns = C/en, and the variance of its 


= 8? Sy? 
VQ) = oa Be (12.26) 
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Hence, double sampling gives a smaller variance if 
en > [VO — Pen + PV en I? 
This inequality may be expressed in two ways: 


Vi — >} 2 
a (1+ V1-—?") E Pie (12.27) 
? a-y 


Cn’ P 

or 
" ACnCn' 
p> (12.28) 
(Cn + ent)? 

ws that, for a given value of p, the ratio of the 
1 sample to the cost per unit in the first 
al value before double sampling brings an 


Equation (12.27) sho 
cost per unit in the second 
sample must exceed a critic 


w & OADUHMWO 


Ratio of cost per unit in second sample to cost per unit in first sample 


11 


5 07 0. 
p= correlation between y; and x; 
D Cn/Cn' and p for three fixed values of the relative 
n, 


Fy 2 i twee! A 
GURE 12.1 Relation betwe f double and single sampling. 


precision 0 
sampling equal 
gives 25 per cen’ 
50 per cent 


ly precise. 
t increase in precision. 
increase in precision. 


Curve I: double and single 
Curve IT: double sampling g! 
Curve IIT: double sampling gives 


4 
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increase in precision. Given Cn and cw, equation (12.28) shows the 
critical value that must be exceeded by p° to make double sampling 
profitable. 

Figure 12.1 plots the values of the ratio &n/cn” (on a log scale) against 
p. Curve I is the relationship when double and single sampling are 
equally precise; curve II holds when Vopt = 0.8V (9), i.e. when double 
sampling gives a 25 per cent increase in precision; and curve ITI re- 
fers to a 50 per cent increase in precision. For example, when p = 0.8, 
double sampling equals single sampling in precision if ¢,/c, is 4, gives 
a 25 per cent increase in precision if Cn/Ca is 
50 per cent increase if ¢,/c,, is about 13. 

For practical use, the curves overestimate the gains to be achieved 
from double sampling, because the best values of n and n’ must either 
be estimated from previous data or be guessed. Some allowance for 


errors in these estimations should be made before deciding to adopt 
double sampling. 


For any p, 
sampling. 


about 73, and gives a 


there is an upper limit to the gain in precision from double 
This occurs when information on 7’ is obtained free (cw 
= 0). The upper limit to the relative precision is 1/(1 — 2), 


12.7 Estimated variance in double sampling for regression. If terms 
in 1/n are negligible, V (ğu) is given by equation (12.24); 
8/7(1 T p”) du pS,’ 


, 


V@r) `=. 
By section 7.3, the quantity 
1 n n 
te 5 , 
Syz = ee [> (yi — 9)? — bP 2 (a; — ar} 


i=l 


n n 


is an unbiased estimate of S,?(1 — p?), where the subscript œ has now 
been dropped. Since 
2 È Ui- 9)? 


n—1 


Sy 


is an unbiased estimate of S,?, it follows that 


(s,? — 8y.z2") 
is an unbiased estimate of p°S,?. 
Thus a sample estimate of V (Gir) is 
2 


ae ODES g Ss 
Gir) = ae + ager (12.29) 


2 
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Tf the second sample is very small and terms in 1/n are not negligible, 
a suggested estimate of variance is 
1 E- #)? (s,? — Sy.2”) 


ae A 
vir) = Sy.z E + J (i — =? te P7 


This is a kind of hybrid of the conditional variance and the average 


variance. 


12.8 Ratio estimates. If the first sample is used to obtain 7’ for a 
ratio estimate of F, the estimate is 


in = 27 (12.30) 
T 


he error of the ordinary ratio estimate (sec- 
he approximate error variance in section 6.3, 
by unity in this term. To the same order 
e the factor g/& in the second component 


The first component is t 
tion 6.3). In obtaining t 
we replaced the factor X/é 
of approximation, we replac 


by the population ratio R=Y/X. Thus 
Yp — Y'=. g — Ba) + RE — X) (12.31) 
If the first and second samples are drawn independently, we obtain 
S2 —2RSye+ RPS! | RPS." 
gane a (12.32) 
Vir) = = 3 


assumed negligible. 
whare thie ipo terma F dom subsample of the first, rear 


If the second sample is & ran 
equation (12.31) in the form 


Pe- Y= g- Ret Be Om 


range 


g- F) +RE - 3) 
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It may be verified that, with the fpe ignored, 
AS 
YOL OS 
n 
= z il 1 
cov [0 — DRE — 3) = -RS (2-5 
1 l 
VIRE — @)} = RS? G ï A] 
n N 
Hence V(r) takes the form 


S,? — 2RSy2e + RS,2 4 2S ye — RES? 


V@r) = - (12.33) 
n n 
Note that formulas (12.32) and (12.33) are both of the form 
n Va 
V@r) = — + = 
n 


Hence the optimum choices of n and n’, and the minimum variance for 
comparison with single sampling, are found by the same procedure as 
for stratification and regression estimates. Details will not be given. 

For sample estimates of variance, the quantities 8,7, Syz) 822, and 
Ê may be substituted in (12.32) and (12.33). The resulting estimates 


v(Jr) are not unbiased, but appear to be adequate to the order of ap- 
proximation presented in the analysis, 


12.9 Repeated sampling of the same Population. As confidence in 
sampling has increased, the practice of relying on samples for the col- 
lection of important series of data that are published at regular inter- 
vals is becoming more common. In part, this is due to a realization 
that with a dynamic population a census at infrequent intervals is of 
limited use. Highly precise information about the characteristics of 
a population in July 1945 and July 1950 may not help much in plan- 
ning that demands a knowledge of the population in 1952. A series 


of relatively small samples at annual or even shorter intervals may be 
more serviceable. 


When the same population (apart from 
sage of time introduces) is sampled repea 
sition to make realistic estimates both o 
to apply the techniques that lead to optimum efficiency of sampling. 
One important question is how frequently and in what manner the 
sample should be changed as time progresses, Many considerations 
affect the decision. People may be unwilling to give the same type of 


the changes which the pas- 
tedly, we are in an ideal po- 
f costs and of variances and 
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information time after time. The respondents may be influenced by 
information which they receive at the interviews, and this may make 
them progressively less representative as time proceeds. Sometimes, 
however, cooperation is better in a second interview than in the first, 
and when the information is technical or confidential the second visit 
may produce more accurate data than the first. 

In the remainder of this chapter we shall consider the question of 
replacement of the sample and the related question of making esti- 
mates from the series of repeated samples. The topic is appropriate 
to the present chapter because double sampling techniques can be 
utilized. 

Given the data from a series of samples, there are three kinds of 
quantity for which we may wish estimates: 

i. The change in Y from one occasion to the next. 

ii. The average value of Y over all occasions. 

iii, The average value of Y for the most recent occasion. 

In most surveys, interest centers on the current average (iii), par- 
ticularly if the characteristics of the population are likely to change 
rapidly with time. With a population in which time changes are 
slow, on the other hand, an annual average (ii) taken over twelve 
monthly samples or four quarterly samples may be adequate for the 
major uses. This would be the situation in a study of the prevalence 
of chronic diseases of long duration. With a disease whose prevalence 
shows marked seasonal variation, the current data would be of major 


interest, but annual averages would also be useful for comparisons 


between different regions and different years. Estimates of change 
tudy the effects of forces that 


(i) are wanted mainly in attempts to s 
are known to have acted on the populdtion. For instance, if a bill is 
passed which is supposed to stimulate the building of houses, it is 
interesting to know whether the building rate of new houses has in- 
creased in the succeeding year (with a realization that an increase 
may not be entirely due to the bill). 

Suppose that we are free to alter or retain the composition of the 
sample, and that the total size of sample is to be the same on all oc- 
casions. If we wish to maximize precision, the following statements 


can be made about replacement policy: A 
i. For estimating change, it is best to retain the same sample 


throughout all occasions. 7 E 
ii. For estimating the average over all occasions, it 1S best to draw 


a new sample on each occasion. sel à ` 
iii. For current estimates, equal precision 1s obtained either by keep- 
ing the same sample or by changing it on every occasion. Replace- 
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ment of part of the sample on each occasion may be better than these 
alternatives. A 
Statements i and ii hold because there is nearly always a positive 
correlation between the measurements on the same unit on two suc- 
cessive occasions. The variance of the estimated change on a unit is 
(81? + 8S — 2pS18S2), where the subscripts refer to the occasions. If 
change is estimated from two different units, the variance is (81? -+ $22). 
In estimating the overall mean for the two occasions, the variance is 
(Si? + So? + 2pS1So)/4 if the same unit is retained, and (S,2 + So?) /4 
if a new unit is chosen. 


Statement iii, which is less obvious, is investigated in succeeding 
sections. 


12.10 Sampling on two occasions. Suppose that the samples are of 
the same size n on both occasions, and that the current estimates are 
of primary interest. Replacement policy has been examined by Jessen 
(1942). For simplicity, we assume that simple random sampling is 


used and that the population variance S? of Y: is the same on both 
occasions. 


The mean of the first samp) 
previous information to utilize. In selecting the second sample, m 
of the units in the first sample are i 


maining u units (u for unmatche: 
new selection. 
Notation. 


Yiu = Mean of unmatched portion on occasion h. 
Fim = Mean of matched portion on occasion h. 
Fa = Mean of whole sample on occasion h. 


The unmatched and matched portions of the sı 


econd sample provide 
independent estimates You's Jom’ of Yo, 


as shown in table 12.1. In the 


Variance 
U : ces s 1 
inmatched: Feu" = Jou = ——— 
u Wou 
Matched: Dam! = Fom + b(M — Fm) Sa 4) + oe eeu 
m n Wom 


matched portion, we use a double sampli 
the “large” sample is the first sample 
the value of y; on the first occasion. T 
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formula (12.24), p. 278: note that our m and n correspond to n and 
n’, respectively, in formula (12.24). ž 

The best combined estimate of Y» is found by weighting the two 
independent estimates inversely as their variances. If Wau, Wom are 
the inverse variances, this estimate is 


Ga! = pozu! + (L — ¢2)Gam" (12.34) 
where 
gi. 
2 Weu + Wom 


By least squares theory, the variance of J?’ is 


1 
VO = pe Wan 


From table 12.1, this works out after simplification as 


n _ Sn — up) 
Vg’) = Ga? =u) (12.35) 


Note that, if u = 0 (complete matching) or if u = n (no matching), 


this variance has the same value, S?/n. 
The optimum value of u is found by minimizing 


to variation in u. This gives 
u 1 A A 
n IVe 1+V1—? 


(12.35), the minimum vari- 


(12.35) with respect 


(12.36) 


When the optimum w is substituted in 


ance works out as i 
5 
VoptG2!) = gn Q+- e?) (12.37) 


for a series of values of p the optimum per cent 


Table 12.2 shows 
lative gain in precision as compared 


which should be matched and the re 
with no matching. The best percentage to match never exceeds 50 
per cent and decreases steadily as p increases. When p = 1, the for- 
mula suggests m = 0, which lies outside the range of our assumptions, 
since m has been assumed reasonably large. The correct procedure in 
this case is to take m = 2. The two matched units are sufficient to 


determine the regression line exactly. bene’ 
The greatest attainable’gain in precision 18 100 


Unless p is high, the gains are modest. 


per cent, when p = 1. 
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Although the optimum percentage to match varies with p, only a 
single percentage can be used in practice for all items in a survey. 
The right-hand columns of table 12.2 show the per cent gains in pre- 
cision when one-third and one-fourth of the units are matched. Both 
are good compromises, except for items in which p exceeds 0.95. 


TABLE 12.2 OPTIMUM PER CENT MATCHED 


p Optimum % gain in i ets pi 1 

% matched precision -=-= =- s= 

rn 3 n 4 
0.5 46 ri 7 6 
0.6 44 11 11 9 
0.7 42 17 ily 15 
0.8 38 25 25 23 
0.9 30 39 39 39 
0.95 24 52 50 52 
1.0 0 100 67 75 


12.11 Sampling on more than two occasions. 


On occasion h, we may have parts of the 
sample that are matched with occasion (h — 1), parts that are matched 
with both occasions (h — 1) and (h — 2), and 
to improve the current estimate, 


n introduction to the subject. At- 


c d irent estimates in which only the re- 
gression on the sample immediately preceding is used. This results 
in some loss of precision, but since the correlation p usually decreases 
as the time interval between the occasions is increased, the loss of 
precision will seldom be great. The variance S2 and the correlation 
coefficient p between the item values on the same unit on two suc- 

throughout, 


cessive occasions are assumed constant, 
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On the third occasion, let m and u be the numbers of units that are 
matched and unmatched, respectively, with the second occasion. The 
two estimates of F; that can be made are given-in table 12.3. The 


TABLE 12.3 ESTIMATES or Y3 ON THE THIRD OCCASION 


Estimate Variance 
Unmatched: Üu’ = Üsu = su 
u Wsu 
S- p 
Matched: Dam! = Tam + bG? — Fem) Ales) P) + p*V(ge') = as 
m Wam 


only change in procedure from the second occasion (table 12.1) is 
that, in the regression adjustment of the estimate from the matched 
portion, we use the improved estimate Je’ instead of the sample mean 
Yo. 

The variance of the matched estimate Jam’ in table 12.3 is derived 
from equation (12.22) of section 12.5, after some translation of nota- 
tion. Equation (12.22), omitting the terms in 1/n”, reads as follows: 
S 53 P) fs B’s,? 


, 


Viir) = VUm) `=. 


n n 
The translations needed to the present notation are: 


n = Size of “small” sample = m. 
B= S 

B = p, because 

S.2/n! = Variance of estimated mean from 


S2 is assumed the same on both occasions. 
“large” sample = V (gz). 


When these substitutions are made, the formula shown in table 12.3 


is obtained. 
On the kth occasion, 
the subscript 3 is replaced by 
We now find the optimum values 
will be shown that the optimum m in 


occasions, and rather rapidly approac | 
Weighting as before inversely as the variance, 


Ün is 


the two estimates remain as in table 12.3 if 
h and the subscript 2 by (h — 1). 
lues of m and u on any occasion, It 
creases steadily on the successive 
hes a limiting value of 3. 
the best estimate of 


gr = oripa! + (L — oiim (12.38) 
“E 


where W, 
aoe 
M= Wi + Wam 
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Since i 
VG’) = — 12.39) 
ga Wru + Wim ( 
we can find the optimum m on occasion h by maximizing 
(Wru + Whm) 
Tt is helpful to write 
se 
VG’) = mn (12.40) 


Since V(g;’) = S?/n, the quantity gn is the ratio of the variance on 
occasion h to that on the first occasion. If the successive estimates 


become steadily more precise, ga will be a decreasing function of h. 
Now, from table 12.3, with h in place of 3, 


i. 1 _ S(1— 9?) Sga 
Whu u ` Wam z m p 


n 
This gives 


i 1 
Win + Wim = Zl ut — (12.41) 
S Sp Pgri 
— + < 
m n 
Hence 


S(Win + Wham) = (n — m) + a 
l-p q Ph 


m n 


By differentiation with respect to m, t 


1 he optimum ma is found to satisfy 
the equation 


y a a a 
my, mn, n 
This gives 
Mi Vi- ø 
n aty (12.42) 


This equation is a generalization of equation (12.86) 
the optimum m on the second occasion. Equation (12. 
that the optimum percentage to match 7 
time, because we would expect Vr’) 
with time. 


which gave 
“qua 42) suggests 
will increase steadily with 
and hence Jr, to decrease 
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In order to complete the solution, it is necessary to find the value of 
gn. By its definition, (12.40), 


E = 1 


g? 1 

=> (Wru + Wan) =- 

nm ai TA nu + Wham) z ae 
+ 

Mh n 


Pgh 


from (12.41). 

By substituting the optimum m and u from equation (12.42) into 
this equation, we obtain a recurrence relation which connects g» with 
gn—ı; After some algebraic manipulation, the relation is expressible as 


Loa a- V1—?*) 
gh gmail + V1- e) 
alues of gy, and hence the minimum 


be worked out for any given value 
he successive values of ga steadily 


(12.43) 


Since gı = 1, the successive V: 
variance and the optimum m, can 
of p. As expected, it is found that t 


decrease, whereas those of mp steadily increase. 
It is easy to show that, as h increases, the quantities ga tend to a 


limit, whose value is obtained by putting gx = gh-1 = Jo in equation 
(12.48) and solving for g..- This gives 


x 2V1- ° 
=g =p 


Hence the variance of ga’ tends to 


Ss? 2V1- ° ) 
diy ao 244 
V@e) (2 VI = ø oe 


Finally, the limiting value of mn is obtained from eq 


Vi- ø” 


1 

My 
D rr 
I wltVvi—e) -2 


uation (12.42) as 


irrespective of the value of p. 

Table 12.4 shows the optimum percentage matched—100m,/n, as 
found from equation (12.42)—and the percentage gains m ee 
as computed from equation (12.43), for p = 0.7, 0.8, 0.9, and 0.95, 


and for a series of values of h. 
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TABLE 12.4 Optimum PER CENT MATCHED AND GAINS IN PRECISION 


% matched 100m,/n % gain in precision 

h = = 

0.7 08 09 0.95] 0.7,,:08 09 0,95 
2 42 38 30 24 i7 25 39 52 
3 49 47 42 36 19 31 55 80 
4 50 49 47 43 20 33 61 94 
5 50 50 49 46 20 33 63 102 
6 50 50 50 48 20 33 64 106 
œ% 50 50 50 50 20 33 65 110 


These results suggest that with repeated sampling a good working 
rule is to retain a fraction one-third or one-quarter of the first sample 
on the second occasion. Thereafter, one-half of the sample should be 
retained on each occasion, and one-half drawn anew. 

These recommendations assume that all replacement policies cost 
the same and are equally feasible, given that the total sample size 
remains fixed. In practice, questions of cost and feasibility should 
not be overlooked. Since extra costs are involved in drawing and con- 
tacting a new sample, cost considerations will point to a slower rate 
of turnover. Moreover, table 12.4 makes it clear that, unless p ex- 
ceeds 0.8, the per cent gains in precision over complete matching are 
modest, and large departures from the optimum matching will not in 
general result in a serious loss of precision. 


12.12 Exercises. 


12.1 A population contains L strata of equal size. If V 
variance of the mean of a simple random sample, and V. sty Vaa are the corre- 
sponding variances for stratified random sampling with 


proportional alloca- 
tion and for double sampling with stratification, show that, approximately, 


ran denotes the 


2 (Yn — Y} 
Tas = 5,2 + 4 FA 
nVa = 82 
a È (Yn — Y} 
Vas = Sk? + =F à 
n L 


where §,? is the average variance within str: 
sumed large relative to L, and the n, in d 
equal to n/L.) 


ata. (N and n’ may both be as- 
ouble sampling may be assumed 
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Hence, if (RP). denotes the relative precision of the stratified sample to 
the simple random sample, with a corresponding definition for (RP)as, show 


that 
(RP)ze 


(RP) = — 
1 tS {(RP)s: — 1} 
For (RP): = 2, plot (RP)as against n/n’. How small must this ratio be in 
order that (RP)as = 1.9? 

12.2 If p = 0.8 in double sampling for regression, how large must n’ be 
relative to n, if the loss in precision due to sampling errors in the mean of 
the large sample is to be less than 10 per cent? 

12.3 In an application of double sampling for regression, the small sample 
was of size 87 and the large sample of size 300. The following computations 


apply to the small sample: 
Ln- 0)? = 17,283; E (u = Mer 2) = 5114; E (s: — H = 3248 

Compute the standard error of the regression estimate of re 

12.4 For p = 0.95, verify the data given in table 12.4 for the optimum 
percentage which should be matched and for the gain in precision relative to 
no matching. Compute the corresponding per cent gains in precision if one- 
third of the units are retained from the first to the second occasion, and one- 
half of the units are retained on each subsequent occasion. 
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CHAPTER 13 
SOURCES OF ERROR IN SURVEYS 


13.1 Introduction. The theory presented in previous chapters as- 
sumes throughout that some kind of probability sampling is used and 
that the observation y; on the ith unit is the correct value for that 
unit. The error of estimate arises solely from the random sampling 
variation that is present when n of the units are measured instead of 
the complete population of N units. 

These assumptions hold reasonably well in the simpler types of sur- 
veys in which the measuring devices are accurate and the quality of 
work is high. In complex surveys, particularly where difficult prob- 
lems of measurement are involved, the assumptions may be far from 
true. Three additional sources of error that may be present are as 
follows: 

i. Failure to measure some of the units in the chosen sample. 
This may occur by oversight, or, with human populations, because 
of failure to locate some individuals or their refusal to answer the 
questions when located. 

ii. Errors of measurement on a unit. The measuring device may 
be biased or imprecise. With human populations the respondents 
may not possess accurate information or they may give biased answers, 

iii. Errors introduced in editing and tabulation of the results, 

These sources of error necessitate a modification of the standard 
theory of sampling. The Principal aims of such a modification are to 
provide guidance about the allocation of resources as between the re- 
duction of random sampling errors and the reduction of the other errors, 
and to develop methods for computing standard errors and confidence 
limits that remain valid when the other errors are present. Until re- 
cently, most of the work in sampling theory was concerned with the 
reduction of random sampling errors, and the necessary modifications 
in the theory are at present incomplete. However, although many 
difficulties remain, a good beginning has been made. 


13.2 Effects of non-response. We shall use the term non-response to 
refer to the failure to measure some of the units in the selected sample. 
In the study of non-response it is convenient to think of 


the popula- 
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tion as divided into two “strata”: The first consisting of all units for 
which measurements would be obtained if the units happened to fall 
in the sample, the second of the units for which no measurements 
would be obtained. ~The compositions of the two strata depend inti- 
mately on the methods used to find the units and obtain the data. A 
survey in which at least three calls are made, if necessary, on every 
house and in which a supervisor with exceptional powers of persuasion 
calls on all persons who refuse to give data will have a much smaller 
“non-response” stratum than one in which only a single attempt is 
made for every house. 

This division of the population into two distinct strata is, of course, 
an oversimplification. Chance plays a part in determining whether a 
unit is found and measured in a given number of attempts. In a more 
complete specification of the problem, we would attach to each unit 
a probability representing the chance that it would be measured by a 
given field method if it fell in the sample. However, the division into 
two strata is adequate for the analysis to be presented here. 

The sample provides no information about the non-response stratum 
2. This would not matter if it could be assumed that the characteris- 
tics of stratum 2 are the same as those of stratum 1. Where checks 
have been made, however, it has often been found that units in the 
“non-response” stratum differ from units that are measurable. An 
illustration appears in table 13.1. The data come from an experi- 
mental sampling of fruit orchards in North Carolina in 1946. Three 
successive mailings of the same questionnaire were sent to growers. 
For one of the questions—number of fruit trees—complete data were 
available for the population (Finkner, 1950). 


TABLE 13.1 RESPONSES TO THREE REQUESTS IN A MAILED INQUIRY 


% of Average no. 


No. of popu- of fruit trees 
growers lation per grower 
Response to first mailing 300 10 456 
Response to second mailing 543 17 382 
Response to third mailing 434 14 340 
Non-respondents after 3 mailings 1839 59 290 
Total population 3116 100 329 


The steady decline in the number of fruit trees per grower in the 
successive responses is evident, these numbers being 456 for respond- 
ents to the first mailing, 382 in the second mailing, 340 in the third, 
and 290 for the refusals to all 3 letters. The total response was poor, 
over half the population failing to give data even after 3 attempts. 
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We now consider the effects of non-response on the sample estimate. 
Let Ni, N2 be the numbers of units in the two strata and let W, = 
N,/N, W2 = N2/N, so that We is the proportion of non-response in 
the population. Assume that a simple random sample is drawn from 
the population. When the field work is completed, we have data for 
a simple random sample from stratum 1 but no data from stratum 2. 
Hence the amount of bias in the sample mean is 


E@)-Y=Y,-Y=Y,-(m.Y,+ W2Y2) 
= W2(¥1 — Yo) (13.1) 


The amount of bias is the product of the proportion of non-response 
and the difference between the means in the two strata. Since the 
sample provides no information about Yo, the size of the bias is un- 
known unless bounds can be placed on Y, from some source other than 
the sample data. With a continuous variate, the only bounds that 
can be assigned with certainty are often so wide as to be useless. 

Consequently, with continuous data, any sizable proportion of non- 
response usually makes it impossible to assign useful confidence limits 
to Y from the sample results. We are left in the 
on some guess about the size of the bias, 
the guess. * 

In sampling for proportions the situatio 
unknown proportion Ps in stratum 2 must lie between 0 and 1. If We 
is known, these bounds for Pz enable us to construct confidence limits 
for the population proportion P. Suppose that a simple random sam- 
ple of n units is drawn and that measurements are obtained for n; of 
the units in the sample. Assuming n; large enough, 95 per cent con- 
fidence limits for P} are given by 


EEN 
nı 


where pı is the sample proportion and the fpe is ‘ignored. 


* Occasionally it pays to make no attempt to sample in one stratum. An exam- 


ple occurs when F% is known to be very small. Without any sampling of stratum 2, 
we pari T2 = 0 as the sample estimate in this stratum. Hence the sample estimate 
of Y is 


Wi + W20) = Win 


position of relying 
without data to substantiate 


n is a little easier, since the 


The bias of this estimate is 
WY, -F= —W2Y2 


If We is known and if an upper bound fi 


to accept the bias and devote the whi 
error of ĝi. 


or Yo is known, 


it may be found profitable 
ole of the sample 


to reducing the sampling 
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When we try to derive a confidence statement about P, we are on 
safe ground if we assume P = 0 when finding Êz and Pa = 1 when 
finding Py. Thus we might take, for 95 per cent limits, 


Pp =W, Q = 2h =e) + W2(0) (13.2) 
1 


Py =W, Q +2 E2) + wat) (13.3) 
1 


It is easy to verify that these limits are conservative, for the state- 


ment A 3 
Prs P Sio 
i.e. 


w (m -2 |2” ) < Pm, + Palla < m (p +2 5t) + Ws 
1 1 


is equivalent to the statement 


W. W: 
P-2 t-a- <msPi+2.o tr 
ny Wi W: 


nı i 
(13.4) 
Whatever the value of Po, the interval (13.4) always includes the 
interval 
q: 
p, -2 |2% < p < Pi +2,72 
nı nı 
Hence 


Pr{P, < P < Pu} = 0.95 


Although limits can be found inthis way if the percentage W2 of 
non-response in the population is known, the limits are distressingly 
wide unless W3 is very small. Table 13.2 shows the average limits for 
a sample size n = 1000 and a series of values of Wz and pı. Since the 
limits in equations (13.2) and (13.3) depend on the value of 71 (num- 
ber of respondents in the sample), we have taken nı = nW, its aver- 
age value, in computing table 13.2. 

The rapid increase in the width oi 
creasing Wo is evident. It is of inte 
would be needed to give the same wi 


f the confidence interval with in- 
rest to examine what values of n 
dths of confidence interval if W2 
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TABLE 13.2 95 PER CENT CONFIDENCE LIMITS FOR P (IN PER CENT) 
WHEN n = 1000 


% of non- Sam) 
ple percentage, 100p; 
reper 5 10 20 50 
100W2 


0 (3.6, 6.4) (8.1, 11.9) (17.5, 22.5) (46.7, 53.2) 
5 (3.4, 11.1) (7.6, 16.3) (16.5, 26.5) (44.4, 55.6) 
10 (3.2, 15.8) (7.2, 20.8) (15.6, 30.4) (42.0, 58.0) 
15 (3.0, 20.5) (6.8, 25.2) (14.7, 34.3) (39.6, 60.4) 
20 (2.8, 25.2) (6.3, 29.7) (13.7, 38.3) (37.2, 62.8) 


were zero. This is easily done when p; is 50 per cent. For W; 2=5 
per cent, table 13.2 shows that the half-width of the confidence inter- 
val is 5.6. The equivalent sample size n,, assuming no non-response, 


is found from the equation 


5.6 = 2 eee 
Ne 


Nne = 320 


For W> = 10 per cent, 15 per cent, and 20 per cent, the values of ne 
are 155, 90, and 60, respectively. It is evidently worth while to devote 
a substantial proportion of the resources to the reduction of non- 
response. 

It may be objected that the limits in table 13.2 are much too con- 
servative, since we have supposed that the worst possible cases have 
actually happened. Since, moreover, the limits are frequently too 
wide to be useful, it is always tempting to make some guesses about 
the bounds within which P, lies and construct much narrower “confi- 
dence” limits for P based on these bounds. There is nothing wrong 
with this procedure if the bounds are correct, but we should recognize 


that the procedure represents the substitution of guesswork for objec- 
tive evidence. 
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if we wished the absolute error in the sample proportion to be less 
than d, we would take (by section 4.4) 
ta PQ 

a 
where fq is the normal deviate corresponding to the risk œ that the 
error exceeds d. With no advance information about P, we would 
take P = 0.5 as the least favorable case, giving 
i? 
4d 
By taking the least favorable combination of the bias W2(P: — P2) 


and the value of Pı, Birnbaum and Sirken show that a value of n 
which still guarantees an error less than d, with risk a, is 


n= 


(18,5) 


n= 


i 
| 13.6 
n =: Gad — WaW Gee) 


Note that no value of n suffices if W2 > d. If Wz = 0, this equation 
reduces to (13.5) apart from the term —1, which comes from an ap- 


proximation in the analysis. Some values of n given by Birnbaum 


and Sirken’s method are shown in table 13.3. 


TABLE 13.3 SMALLEST VALUE OF 7 FOR GIVEN LIMIT OF ERROR d, WITH 
RISK a = 0.05 


% non- d (in per cent) 
response, 
100We2 20 15 10 5 


0 
2 27 50 122 653 
4 31 60 166 2000 
6 36 75 255 
8 43 99 521 

10 53 142 

15 112 rai 


ory as table 13.2. If we are content 
amounts of non-response up to 10 
the sample size. However, any 
akes it impossible or very costly 


The table tells the same sad st 
with a crude estimate (d = 20), 
per cent can be handled by doubling 
sizable percentage of non-response mM 
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to attain a highly guaranteed precision by increasing the sample size 
among the respondents. 


13.3 Optimum sampling fraction among the non-respondents. The 
non-respondents in a survey may be divided into two classes. One— 
the hard core—consists of units for which it is impossible to obtain 
data. With human populations, this class contains people who cannot 
be found, who adamantly refuse or who are for some reason incapable 
of giving the data, and for whom data cannot be supplied by another 
person. The second class consists of units for which data can be ob- 
tained by more intensive and costly field methods than those originally 
contemplated for the survey. This class contains people not at home 
on the first call and people who are reluctant, but can be persuaded, to 
give data. 

Hansen and Hurwitz (1946), to whom the results in this section are 
due, propose that this class be sampled with a smaller sampling fraction 
than the units in stratum 1. 

The first step is to take a simple random sample of n units, using 
the ordinary field methods. Let nı be the number of units in the sam- 
ple that provide the data sought, and Nna the number in the non-re- 
sponse group. By more intensive efforts, the data are later obtained 
from a random sample of rz out of the no. Let 


n= kra (k > 1) (13.7) 


Then the average sampling fraction in the first stratum is k times that 
in the second. This follows because if k is fixed in advance 


(E-G) 


2 
The values of n (initial size of sample) and k are chosen so as to give 
a specified precision for the lowest cost. 

The cost of taking the sample is 


C = con + cn, + Cog 


where the c’s are costs per unit: cp is the cost of making the first at- 
tempt, while c; and cs are the costs of getting and processing the data 
in the two strata, respectively. Since the values of nı and ng are not 
known until the first attempt is made, the expected cost is used in 
planning the sample. The expected values of nı and rz are, respec- 
tively, Win and Wən/k. Thus expected cost is 


Won 


a n (13.8) 


13.3 OPTIMUM SAMPLING OF NON-RESPONDENTS 299 


Let ğı, Jor be the sample means in the two strata. The subscript r 
is introduced as a reminder that the sample in the second stratum is of 
size rə. As an estimate of the population mean, we take 


1 
g= = (mG + nağ2r) (13.9) 


Note that the second stratum receives a weight n2, although the sam- 
ple is only of size rz. This is done in order to obtain an unbiased 
estimate. i 

This procedure is an application of double sampling with stratifica- 
tion. The first or “large” sample, of size n, gives an estimate ny/Ng 
of the relative size of the strata. The second or “small” sample is of 
size n; in the first stratum and rə in the second stratum. Unfortu- 
nately, the variance of 7 cannot be derived from the variance formulas 
which were given in section 12.2 for double sampling with stratifica- 
tion. In section 12.2, the sizes n in the second sample were assumed 
fixed, whereas in the present problem 71 and rz are random variables. 

To find V(g’), write 


1 n 
g = z (nigi + Neen) + = (Jor — Gan) (18.10) 


where Jon is the mean of the whole sample of size ng from stratum 2. 
The first term on the right is the mean of a random sample of size n 
from the whole population. Its variance is therefore 
(N — n) S? 
N n 
where S? is the variance of the whole population. Further, when we 
find the variance of 7’, there is no contribution from cross-products 
between the first and second terms. For 


E{Gon(@ar — Yon)} = 0 


over all random samples of size r2 that can be drawn from a fixed 


sample of size nə. N 
‘Consider the second term on the right of (13.10). If F, is the popu- 
lation mean of the “non-response” stratum, we have 
(Gor — Y2) = Gar — Gon) + (Jon — Ya) 
so that 
E@er a Y)” = E(@er = Gon)” T E(Geon oo Yo)? 
from cross-product terms for the same 


there being no contribution v 
the mean of a simple random sample of 


reason as before. Now gar is 
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size r2 from the second stratum, and #2, is the mean of a simple random 
sample of size no from the same stratum. Hence, for fixed nz and rə, 
(Ne — r2) S2? (N2 — no) So” 


— = B(er — Gon)? + 
u a (Ger — Gon) a & 


where S2? is the variance within the “non-response” stratum. This 
gives 


E T (n2 — r2) (k — 1) 
E(Gar — Jon)” = S? (- = 5) = S,? = 8.2 


T2 NTa ne 


since ng = kro. . ; 
Hence, adding the variances of the two terms in (13.10), we find, for 


fixed no, 
N — n) S? 2(k—-1 
V(7’) = N-S + (5 ( ) 92 
N n 


n Ng 
N—n)S? (k—-1 
See 4! > ) ms (13.11) 
N n n 


Since E(n2) = nW, this gives for the expected variance 


= N-n)? (k—1W. 
_ 9g) = NaS, &- vm 
N n n 


So? (13.12) 


The first term is the variance that would be obtained if all Ng in the 
non-response group were sampled. The second term is the increase in 
variance from sampling only r2 of the ng. 

The quantities n and k a 
(13.8) for a preassigned valu 

The solutions are: 


co(S? — W8,?) 
kon = E 22) (13.13) 
S2 (co + c W1) 
NUS? + (k — 1)W2S,2} j 
-N U ; 
NV +S? data) 
where V is the value specified for the variance of the estimated popula- 
tion mean. 

The solutions require a knowledge of Wa: often 
from previous experience. In addition to 8, 
estimated in advance in any “sample size” , the solutions also 
involve S»”, the variance in the non-r 
So” is naturally harder to predict; 


re then chosen to minimize average cost 
e of the expected variance (13.12), 


Nopt = 


this can be estimated 


13.3 OPTIMUM SAMPLING OF NON-RESPONDENTS 301 


S?. For instance, in surveys made by mail of most kinds of economic 
enterprise, the respondents tend to be larger operators, with larger 
between-unit variances than the non-respondents. 

This technique was first presented for a survey made by mail, ‘fol- 
lowed by visits to a subsample of those who do not answer the letters. 
With a mail survey, Wə may be large and its value may be difficult to 
predict for determining nopi- In this event a satisfactory approxima- 
tion is to work out the value of nop: for a range of assumed values of 
W between 0 and a safe upper limit. The maximum nop: in this series 
is adopted as the initial sample size n. When the replies to the mail 
survey have been received, the value of ng is known. The variance 
formula (13.11) is then solved to find the value of k that gives the de- 
sired variance V. The cost for this method is usually only slightly 
higher than the optimum cost which would have applied if W2 were 
known. 

Example. This example is condensed from the paper by Hansen 
and Hurwitz (1946). The first sample is taken by mail and the re- 
sponse rate W, is expected to be 50 per cent. The precision desired is 
that which would be given by a simple random sample of size 1000 if 
there were no non-response. The cost of mailing a questionnaire is 
10 cents, and the cost of processing the completed questionnaire is 
40 cents. To carry out a personal interview costs $4.10. 

How many questionnaires should be sent out, and what percentage 
of the non-respondents should be interviewed? 

In terms of the cost function (13.8) the unit costs in dollars are as 


follows: 


co = Cost of first attempt =0.1 
cı = Cost of processing data for a respondent = 0.4 
cə = Cost of obtaining and processing data 


for a non-respondent = 4.5 


The optimum n and k can be found from equations (13.13) and 
(13.14). If the variances S? and So? are assumed equal, and W is as- 


sumed very large, then 


‘ e = We) _ a G05) _ Vez o780 
si otami 0.1 + (0.4) (0.5) 


ry 


SL + EA DMA) L sooo 1 + (1.789)(0.5)) 


Nopt = 7 
= 1870 
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Note that we have put S’/V = 1000, or V = S?/1000, since this is the 
variance that the sample mean would have if a sample of 1000 were 
taken and complete response were obtained. 

Consequently, 1870 questionnaires should be mailed. Of the 935 
that are not returned, we interview a random subsample of 935/2.739 
or 341. The cost will be found to be $2095. 

With stratified sampling, the optimum values of the n, and the kr 
in the individual strata are rather complex. A good approximation is 
to estimate first, by the methods in sections 5.5 and 5.6, the sample 
sizes no, that would be required in the strata if there were no non- 
response. Now from equation (18.14), if We = 0, we have 


NS? 
NV +S? 


Hence equation (13.14) can be rewritten as 


no = 


Nopt = No{1 + (k — 1) W2S82?/S?} (13.15) 


This equation, applied separately to each stratum, gives an approxi- 
mation to the optimum n}. The values of ka are found by applying 
equation (13.13) in each stratum, 

These techniques can be used with ratio or regression estimates. 
With the ratio estimate, the quantities S? and S,? are replaced by S4? 


and Seq”, where d; = Yi — Rx; With a regression estimate, S? þe- 
comes S?(1 — p?) and S3? becomes $2?(1 — p?). 


13.4 Other techniques for non-response. 
times thought to solve the non-response pri 
some neighboring unit whenever a non- 
instance, if a house belonging to the sa 
home, the next house in the street is 
sample for which data are obtained re 
planned. 

All that this method accomplish 
the first stratum. It does not obtain 


One method which is some- 
oblem is to collect data for 
response is encountered. For 
mple is found to have no one at 
visited. In this way the size of 
mains equal to the size originally 


luded into thinking 
y dealt with. 
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When non-response is due primarily to absence of people from their 
homes, an ingenious approach has been made by Politz and Simmons 
(1949, 1950). Suppose that all calls are made between the hours of 
6 p.m. and 9 p.m. during the period of field work—say a month—and 
that only one call is made per respondent. If a person is always at 
home during these hours in this month, he is certain to be found if he 
falls in the sample. A person who is home half the time during these 
hours has a 50 per cent chance of being at home when the interviewer 
calls, if we can assume that the call takes place at a random instant of 
time during the period. 

The persons in the population can be classified into strata according 
to the proportion of the time that they are at home. Let this propor- 
tion be rp in stratum h. If a member of stratum h falls in the sample, 
the probability that he is found is ma. This approach is a generaliza- 
tion of the mathematical model which we adopted in sections 13.2 and 
13.3. In the earlier model there were only: two strata, with ma = 1 
and 0, respectively. 

In the new approach, the sample for which data are obtained is seen 
to be overweighted with persons who are at home most of the time. 
If these persons differ in their characteristics from those who are less 
frequently at home, a non-response bias is produced. Given the 
values of the ma, most of this bias can be removed. The artificial data 
in table 13.4 illustrate the process. 


TABLE 13.4 DATA FOR ILLUSTRATING RESULTS WITH VARYING PROPORTIONS OF 


“NOT-AT-HOMES” 
Th nh nh’ Yn = in Uk Yn/Th 
1 100 100 1 100 100 
0.5 100 50 2 100 200 
0.25 100 25 3 75 300 
Totals 300 175 275 600 


There are 3 strata of equal size, with 7, = 1, 0.5, and 0.25. An 
initial sample size of n = 300 is planned (100 of which would fall in 
each stratum, on the average). Owing to the absences from home, the 
actual sample sizes n’ average 100, 50, and 25, respectively. 

With the assumed values of F», the true population mean Y is 2. 
We have ignored the within-stratum sampling variances, taking J, 
= Y,. The observed sample total y comes out as 275, and the sample 
mean is 275/175 = 1.57. This is negatively biased because the “not- 
at-homes” have higher values of Y, than the “at-homes.” 
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An estimate free from bias is obtained by weighting the total from 
any stratum inversely as the proportion of responses. Thus 


n Th 300 


where n is the original size of sample. 

In practical applications some bias remains in the estimate, because 
persons with m} = 0 are not represented at all in the sample. 

The chief problem is to estimate the m}. In the method proposed by 
Politz and Simmons, the interviewer asks the respondent, for each of 
the 5 previous nights, whether he was at home at the particular time 
of day at which the question is asked. The respondents are then 
classified into 6 strata, with estimated values of Th from $ 
If the time at which the question is asked can be consi 
this technique gives unbiased estimates of Thy 
estimates may be rather high. For instance, a person who was regu- 
larly at home from 6 p.m. to 7 p.m. but out from 7 P.M. to 9 P.M. would 
report a m, of unity if the interviewer called before 7 p.m. and would 
not be in the sample if the interviewer called after 7 p.m. An alterna- 
tive is to ask about 5 random instants of time between 6 p.m. and 
9 P.M. 

The assumption that the time at which the interviewer calls is ran- 
dom is, of course, open to question. Some judgment about the reason- 
ableness of the assumption can be obtained from an analysis of the 
times at which interviewers do call. 

The variance formula for this procedure and a comparison with the 
“call-back” method of reducing non-response are given in the 1949 
Politz and Simmons reference. Persons interested in using the method 
should study this reference, since the presentation given here is over- 


simplified, and the assumptions needed to obtain unbiased estimates 
of Y require careful statement, 


to $ inclusive. 
dered random, 
but the variances of the 


13.5 Errors of measurement. 


This term will be used in a broad 
sense, to denote any difference be 


tween the correct value ni for an item 
on the 7th unit and the value y; which is assigned to that item in the 


computations from which the estimates are made. In this sense, 
errors of measurement include errors introduced in recording, editing, 


and tabulating the data, as well as errors that result from deficiencies 
in the measuring device. 


The problem of errors of measurement is 
chemistry, and biology. A large amount of 
gathered about the behavior both of instrumen 


an old one in physics, 
information has been 
ts of measurement and 
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of the human observer who plays a role in many measuring techniques. 
Much of this knowledge should be applicable to errors of measurement 
which occur in sample surveys. 

Less is known about errors of measurement when the information, 
is given verbally in response to a verbal question in an interview. 
Since the interplay between two individuals is involved, the pattern 
of errors may be complex. The importance of this topic is now real- 
ized by agencies which regularly sample human populations, and re- 
search studies of interviewing errors are on the increase. 


13.6 A mathematical model for errors of measurement. We now 
construct a mathematical description of some of the major components 
of errors of measurement. Consider a very large number of repetitions 
of the measurement on the ith unit, and let yiq be the value obtained 
in the ath repetition. Then we write 


Yia = ni + gi’ + Cia (13.16) 


where 7; = Correct value. 
g? = Bias component. 
eia = “Random” component, with mean 0. 


This model assumes that in repeated measurements of the ith unit 
the errors (g; + ĉia) follow some frequency distribution with mean g;’. 
With a well-controlled measuring instrument, the frequency distribu- 
tion is often approximately normal in shape. With a new and com- 
plex measuring process, on the other hand, it cannot be taken for 
granted that repetitions will follow any single frequency distribution: 
the shape may change erratically with time. Such cases are not cov- 
ered by the theory presented here, since we assume that eia acts like a 
random variate in the probability sense. 

The next step is to consider how these components change when we 
move from one unit to another. Various complications may occur. 

For the bias component g;’, there may be a constant bias g that 
affects all units alike. There may be a component gi which follows a 
fairly simple frequency distribution over the population. This com- 
ponent may be correlated with the correct value 7;: for instance, the 
measuring device may consistently underestimate high values of 7; 
and overestimate low values. 

There may be a complex pattern of interrelationships between the 
values of g; on different units, quite apart from any correlation that 
is created as a result of the correlation between gi and 7;. The simplest 
example is the “interviewer bias.” Dramatic differences have some- 
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-times been found in the mean values of y; obtained by different inter- 
viewers who are sampling apparently comparable parts of the same 
population (see Lienau, 1941, and Mahalanobis,’ 1946). The same 
effect has appeared when samples of a growing crop are cut by different 
teams and when chemical or biological analyses are done in different 
laboratories. The human factor is not the only cause for correlations 
among units that are measured at about the same time. Many meas- 
uring devices are affected by the weather: some use raw materials 
whose quality varies from batch to batch. 

To turn to the “random” component of error Cia, the frequency 
distribution which it is presumed to follow may change from one unit 
to another, although with a good system of measurem 
should be slight. As with g; 
values of ej, on different units. 


The components of the error of measurement are summarized in 
table 13.5. 


ent such changes 
there may be correlations between the 


TABLE 13.5 COMPONENTS or THE ERROR OF MEASUREMENT ON THE 7TH UNIT 


Notation Nature of component 
9 Constant bias over all units, 
gi “Variable” component of bias, wh 


hich follows some fre- 

quency distribution with mean zero, as 7 varies, and may 
be correlated with the correct value 7;. 

ĉia “Random” component of error, 
quency distribution, 
These frequency dis 
varies. 


which follows some fre- 
with mean zero, as æ varies for fixed i. 
tributions may change in shape as 7 


We have noted further that values of gi on different units may be 
correlated with one another, and similarly for values of eia On different, 
units. 

A more detailed account of the constru 
is given by Hansen et al. (1951). 
results given in later sections. 


ction of a model of this kind 
This paper contains a number of the 


13.7 Effects of constant bias. Suppose that the measurements y; on 
all units are subject to a constant bias g whose magnitude is unknown. 
Then the sample mean 7 of a simple random sample is also subject to 
bias g. In the estimated error variance which we attach to the sample 
mean, the bias cancels out, since this esti 


mate is derived from a sum 
of squares of terms (y; — g). Consequently, the usual computation 
of the confidence limits for Y from the sample data takes no account 


of the bias. The same results hold in stratified random sampling. 
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The situation is also essentially the same with regression and ratio 
estimates. Consider the regression estimate 


Jr = 9 + W(X — 2) 


where both the y; and the z; may be subject to constant biases gy and 
gz, respectively. Since the least squares estimate b remains unchanged, 
and since the bias g+ cancels out of the term (X — 2), it follows that 
Gar is subject to a bias gy. It is easy to verify that the sample estimate 
of V (ğı) contains no contribution due to the biases. 


With the ratio estimate 
F< 
x 


Bie 


Tr = 


the bias is also gy, to a first approximation, since in large samples 
E(X /ž) is approximately 1 even if the x; are subject to a constant bias. 
In large samples the sample estimate of variance 


(N — 2) È (v — Ra)? 


ii) = Nn n-1 


will be almost free from bias as an estimate of 
E(Gr — Y? 


i.e. as an estimate of the variance about the biased mean y 

To summarize, a constant bias passes undetected by the sample 
data. As we have seen (section 1.5), the 95 per cent confidence proba- 
bilities are almost unaffected if the ratio of gy to the standard error of 
the estimated mean is less than 0.1, but as the ratio increases beyond 
the computation of confidence limits becomes misleading. 


this value 
) 
shen a number of independent 


It should also be remembered that w 
estimates, all subject to the same bias, are averaged, this ratio in- 
creases, since the bias remains constant but the standard error of the 
average diminishes. Estimates of change from one time period to an- 
other, or from one stratum to another, remain unbiased, provided that 


the bias is constant throughout. 

estimate is more affected by constant biases than the 

timates. If the relation between n; and z; is astraight 

line through the origin, and yi = ni + gy, the linear relation between Yi and Ti 

does not pass through the origin. The consequence is that the precision of the ratio 

estimate, as an estimate of the biased mean Y, is affected by constant bias gy or gz 
i tance if decisions about the use of the ratio esti- 


The point is not of practical impor 
mate are based on the observed relation between y; and £i. 


* In one respect the ratio 
regression or mean per unit es 
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13.8 Effects of components that are independent from unit to unit. 
If constant bias is ignored, the model becomes 


Yia = Ni + gi + eia 


The distinction between the bias g; and the random error of measure- 
ment eia on the ith unit must be preserved if we are discussing sampling 
plans in which the same unit is measured several times in order to in- 
crease precision. Since discussion will be confined to the case where 
any unit is measured only once, the error (g; + ĉia) can be combined 
into a single term ej. The model simplifies to 


Yia = Ni + Cia 


The nature of the population should 
sampling, whenever the ith unit appears in a sample the value of ni 
remains unchanged, since ni.is the correct value for this unit. How- 
ever, a new measurement of the unit is made, giving a new value to 
the error of measurement cia. In any expressions which contain ĉia; 
we average both over all possible measurements of the unit and also 
over all units or all samples of a given type. 


Since constant bias is ignored, the average value of eiw over the 
population is taken as zero, This may be written 


EE Gia =0 


be clearly realized, In repeated 


where Ẹ denotes an average over all measurements of the ith unit 
t 


and E a subsequent average over all units. 
correlated with n;, and it m 


€g on any two different units 
‘obability sense, the sampling 


remain valid, provided that 
the population can be regarded as infinite. 


This is easily seen for simple random sampling. The independent 


uarantees that the successive values of ni 
ependent. It does not guarantee that the 
values of the eia are mutually independent, but we have assumed that 
this is so. Hence, the y;« are independent on successive units, and the 
ordinary theory for random sampling from an infinite population is 
applicable to the y;a. 
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In particular, if E(m;) = F is the true population mean, 


4 g 
VQ) = EG -A =% = = (oy? + 02 + 2o) (13-17) 


where 


o? = Elm — A)? 
oè = EE(€ia?) 
Pyetnte = EE | iali — H)} 


Further, since the Yia are independent members of a simple random 
sample from an infinite population, 


— p2 
ial È Wix - 9) alae 
n (n—1) 


is an unbiased sample estimate of V (g). In this case, fortunately, the 
ordinary formula for the estimated sampling error remains valid when 
errors of measurement are present. In particular, two measuring 
processes can be compared empirically by finding which gives the 
smaller value of v(¥) for a given cost. 

In the same way it can be shown that the formulas given in previous 
chapters for the sample estimates of error variances remain valid for 
stratified sampling and for ratio and regression estimates, provided 
that the errors of measurement in both yie and Tia are independent 
from unit to unit and that the fpe’s can be ignored. 

When n/N is not negligible, some change is required in the formulas. 
The development will be sketched briefly for the mean of a simple 
random sample. With a finite population, the population variances 
and covariance will be defined as follows: 


2- :— A)? 
g= TE D JÈ (m — M) 
=— zE Elia” ) 
N jai 
5 -H 
s= Ef éia(ni ) 
Pren € E = 1) woo i { (ni — } 


Note the use of the divisor M in defining a: this helps to keep the 


results simple. 
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The error of estimate of the sample mean is 


a A (13.19) 
By theorem 2.3, 
EG — Hy? = wa S2 (13.20) 
Nn 


For the average value of 2, we have 
[Ss ere 
EE) = E \— D Elei?) 
n ln S 


since the errors of measurement on different units in the sample are by 
hypothesis independent. When the average is taken over all simple 
random samples of size n, we obtain 


2 iz 2 1 o 
EE@) = aN >> Elea”) = ac (13.21) 
i=l 


For the cross-product, He(7 — H), let us suppose first that there is 
a fixed error of measurement ĉia associated with the ith unit, Then, 


by the ordinary theory for simple random samples from a finite popu- 
lation (theorem 2.3), 


=e 
Nn (N — 1) 
where this average is over all sim 
set of errors of measurement. 
sets of these errors, we have, by 


ple random samples with this fixed 
Taking the average over all possible 
the definition of p,., 


N-n) 
Wn Prete (13.22) 


Finally, from (13.20), (13.21), and (13.22), 


N — n) 2 
Fay Ss? + Boyce} + - 


ma H) = 


Y) = (13.23) 
This formula replaces (13.17) when the fpe is not negligible. If 
n = N, the variance reduces to o2/N instea 


d of to zero, reflecting the 
fact that some error of measurement remains in the average of N inde- 
pendent measurements. 
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With this model, the mean square deviation from the sample mean, 
Lun? 
go 
a—-1 
can be shown to be an unbiased estimate of 
Gy? = Si? + priS + oe 


The usual formula (section 2.6) for the estimated variance of ¥ is 


o-r) 


N- 
Evig) = a {s,? + pyre + of} 


n 


Hence 


By comparison with (13.23) it follows that v(y) has a negative bias 
which amounts to ¢2/N and will usually be small. If the fpc is omitted 
from »(g), we obtain an overestimate. An unbiased estimate cannot 
be constructed without knowledge of oe. 


13.9 Effects of correlation between errors on different units. With 
the same model, 

Yia = i + Fie 
suppose that there are correlations between the éi¢ on different units 
in the same sample. Simple random sampling from an infinite popula- 


tion is assumed. 
In finding E(g — H)’, the only term to be added to the result (13.17) 


in the preceding section is the cross-product term 
2 
— EE p catia} (13.24) 
no lig 
We may define the average within-sample correlation coefficient Pee 


by the equation 
2 
pasè = ——— EE È €ia€ja 


n(n—1) * 
Hence the cross-product term (13.24) contributes 
(n — 1) 


2 
DecFe 
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Adding this term to (13.17), we have 


V@) = Tios E progo + oP {1+ (n — 1)Be}] (13.25) 
n 
The average value of v(¥) is found in the same way to be 
1 
EG) = — lon? + Zopo + o2{1 — Bee}] (13.26) 
n 


Thus the standard estimated variance is biased. Since 5e appears 
to be positive for most types of measurement error, v(ğ) is usually an 
underestimate. Whether the underestimation is serious depends on 
the relative sizes of o,? and ¢2 as well as on the value of Se. 


The analogy with systematic sampling, or more generally with 
cluster sampling, is apparent. 


nits is divided at random into k 
groups of units, each group containing m = n/k units. The field work 


model, it is Convenient to label the 
units by a double subscript notation, Let 


Ya = 15 F éga 
where denotes the group and 
population is assumed infinite, 
population is assumed zero, 

Since group 7 is a random subsample of a sim 
is itself a simple random sample from the p 


Jj the member within the group. The 
and the average of the €ija Over the 


ple random sample, it 
opulation. Hence, by 
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equation (13.25), the variance of the group mean 7; is 
VOA = — lon? + Borge + o2{1 + (M — Pal] 
Since errors are independent in the different groups, 
vg) = k Vi) = Tia? + Qpreoyre + oe {1 + (m — 1)Pe}] (13.27) 
putting n = mk. 


If, as before, the variance of a single measurement of a unit by an 
interviewer chosen at random is denoted by 


2 2 
oy? = op + ppor + oe 


then (13.27) may be written 


2 Pas? Pere 
va = + = (13.28) 
An alternative model which produces the same variance formula is 
Vij = Mij + Ji F €ija (13.29) 


In this model g; is the personal bias of the 7th interviewer. Over the 
population of interviewers, Z(g:) = 0 (i.e. there is no bias common to 
all interviewers), and B(g,2) is denoted by o,”. The “random” errors 
of measurement, €ija, are now assumed to be uncorrelated from unit to 
unit, although they may be correlated with the 7;;, and to average to 
zero over the population. 

In repeated sampling, we select each time a random sample of k 
interviewers from an assumed infinite population of interviewers. 
Each interviewer is assigned to a random subsample of a simple ran- 
dom sample of units. 

In order to find V(g) we write as usual 

¢-H=@-E)+e+e 
When the square is averaged over all samples, there is no contribution 
from the product g(7 — H), because in repeated sampling a given set 
of interviewers appears equally often with any given set of units. For 
the same reason, E{gé} is also zero. Hence 
a; + Qpyednre T ae + z (13.30) 
- 7 7 


VQ) = 


2g? of 
i Naa ree (13.31) 
k 
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by substituting ø,” as found from (13.29). This is the analogue of 
i 3.28). 

ia pote shows that there is a limit to the precision obtainable 
with a fixed number k of interviewers. For given n, the precision is 
greatest when k = n, that is, when each interviewer measures only one 
unit. (This conclusion may be unrealistic, since it assumes that inter- 
viewers are drawn at random from a pool of interviewers of equal 
quality. In practice, average quality may well be higher when only a 
few interviewers are to be recruited than when many must be found.) 

With the same assumptions, it is easily verified that an unbiased 
estimate of V(g) from the sample is 


k 
XG - 9) 
vg) = i. D 


This is the most useful pro 
samples. Note, however, 
which the estimate is base 
interviewers participate. 


The analysis of variance of the sample data (table 13.6) is also of 
interest. The expectations sh 


own for the mean squares can be verified 
from the model. 


perty of the method of interpenetrating sub- 
that the number of degrees of freedom on 
d is (k — 1): this will be small if only a few 


TABLE 13.6 EXPECTATIONS IN THE ANALYSIS OF VARIANCE OF THE 


SAMPLE 
(ON A SINGLE-uNIT BASIS) 


df ms Expectations 
mG: — 9? 
Between groups (k-1) s = = Ss (oy? — og) + mag? 
XD wis — g)? 
Within groups k(m — 1) By? = Hea (7 — o?) 


The variance within groups sw? is an unbi 
ance per unit that would be obtained if in 
removed, while s,” estimates the actual varia: 
Comparison of the two mean squares reveal, 
precision from the biases. 


The formula for V(z) can also be used 


number of groups for a given field cost si 
cost function that might apply is 


ased estimate of the vari- 
terviewer biases could be 
nce per unit in the survey. 
s the extent of the loss in 


to determine the optimum 
tuation. A simple type of 


C = en + cek 
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If n and k are chosen to minimize V(g) for fixed C, we find that the 
optimum size of group (number of interviews per interviewer) is 


6) Vaf — op T 
k/ opt Tg c 


A sample estimate from a preliminary sample is 


Y Vm Sw l- 
Elo M — w Ve 


Structurally, these equations are the same as those in section 10.6 
for determining optimum sampling and subsampling ratios in two- 
stage sampling. In fact, the whole mathematical approach is the same 
as for two-stage sampling. This analysis is subject to the assumption 
already mentioned that we can vary the number of interviewers with- 
out affecting their average quality. 

The method is not applicable if the groups are not random sub- 
samples. A common practice is to assign each interviewer to a small 
geographic area near his home, in order to decrease travel cost per 
interviewer. Ii there are real differences between the averages 7; for 
different areas, these differences appear in the analysis of variance as 
if they were interviewer biases. Thus sy” becomes an overestimate of 
the actual variance per unit, and Sw? an underestimate of the variance 
that would apply if interviewer biases could be removed. 

One way of avoiding this difficulty without too great an increase in 
field costs is to stratify the sample into compact areas. The sample in 
each stratum is then divided into random subgroups. Each inter- 
viewer is required to travel over the whole of a stratum but not over 
the whole sample. The analysis presented here then applies to an in- 
dividual stratum, and an unbiased estimate of V (sr) is built up in the 
usual way. For other solutions of the problem, see Hansen et al. (1951). 

This technique of interpenetrating subsamples is skillful, since it 
gives not only an unbiased estimate of error but also an insight into 
the magnitude of the effects of correlations in the errors of measure- 
ment on different units. Its utility is not confined to the handling of 
interviewer biases: the different groups may represent different. field 
teams who are taking crop samples, or different medical teams who are 
making diagnostic examinations, and so on. Limitations of the tech- 
nique are the increase in travel costs, which may be considerable in 
some surveys, and the fact that the technique does not handle any 


bias that is constant over all units. 
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13.11 Summary. From the point of view of their effects on the for- 
mulas given in previous chapters, the additional sources of error may 
be classified as follows: , 

i. Errors of measurement that are independent from unit to unit 
and average to zero over the whole population are properly taken into 
account in the usual formulas for computing the standard errors of the 
estimates, provided that the fpe is negligible. Such errors do, of 
course, decrease the precision, and it is worth while to learn something 
about their magnitude in order to find out whether the decrease is 
serious. 

ii. With non-response, the usual formulas for the standard errors, as 
eomputed from the units that were measured, are likely to be under- 
estimates since they ignore the bias due to differences between respond- 
ents and non-respondents. The sampler has no excuse for not being 
aware of this problem: a complete record of all non-response, with 
reasons, is an essential part of good practice. If non-response can be 
reduced by expenditure of greater effort on a certain segment of the 
population, the method of Hansen and Hurwitz (1946) shows how to 
allocate resources to this segment. 


iii. If errors of measurement are correlated from unit to unit, the 


iv. A constant bias that affects all units alike is hardest of all to 
detect. No manipulations of the sample data will reveal this bias, 

There is much work to be done on these problems, Perhaps the most 
urgent need is for the accumulation of data on the nature and size of 
errors of measurements in sample surveys. In many cases it will be 
found that the usual formulas and techniques are disturbed to only a 
minor degree. In others it will become evident, as has happened in 
some types of study, that random sampling errors are the least of our 
troubles, and that precise estimates are unattainable until a drastic 
reduction in one of the other sources of error is made. More needs to 
be learned, also, about what can be accomplished by good training and 


supervision and by rechecks of the units from more experienced per- 
sonnel. 
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ANSWERS TO EXERCISES 


24 P = 51,473. Pr about 0.9. 
2.5 SE (in 1000’s) = (i) 14,800; (ii) 3900; (iii) 3140. 
26 9.2. (i) 2.7; (ii) 2.4. 


3.2 1064, 1336. 

3.3 Nearly conclusive. 

3.6 (i) 76.2 + 3.6 per cent; (ii) 1738 + 280 families. 
3.7 Ay = 13. 

3.8 Average size of sample = m/P. 


4.1 (i) 2475; (ii) 4950. 

4.2 n = 21 (taking t = 2). 

4.3 n = 484. For number of unemployed, cv would be about 15 per cent. 
4.4 62 more. 


UNS \* 
4.6 ieee | 
p (Ea) 


5.1 (i) nı = 375, no = 625; (ii) nı = 250, nə = 750. 
5.2 RP = 181 per cent for proportional allocation and 214 per cent for optimum 


allocation. 
5.3 Best point of division is at 120 acres. RP = 82 per cent. 


yore vo 


5.6 nV (Ja) = 1 — a-e 


5.7 (i) Gain in precision is about 110 per cent. (ii) Gain from proportional 
stratification over simple random sampling is about 90 per cent. 
6.8 (i) 3.733; (ii) 1.111; (iii) 8.222. 


6.1 Gain = 66 per cent. At least 11 units by the ratio method. 

6.2 Quadratic limits (27,100; 29,870): normal limits (27,030; 29,700). 

6.3 Variance is 0.00184 computed by the ratio method and 0.00160 by the 
binomial formula. 

6.4 The variances are 46.5 for the separate ratio estimate and 40.6 for the 
combined ratio estimate. In both cases the contribution of bias to the variance 


is negligible. 
e . 
7.1 Vig) = 1.03, V(gr) = 10.3 (one of the samples gives a very poor esti- 
mate with the ratio method). Values of B/o are 0.32 and 0.27, respectively, where 
ois taken about F. 
72 Yo, = 28,177 +570. RP = 113 per cent. 


7.3 27,751 + 694. 
74 V(Pir) = 34.5; V(Pirc) = 10.3. 
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8.1 Variances are 8.19 (systematic), 11.27 (simple random), 8.25 (stratified, 2), 
7.46 (stratified, 1). i , 

8.2 No: variance is 8.78 with end corrections. 

8.5 Vey = 0.00141, V;an = 0.00340. 


9.1 Relative net precisions are 111, 125, and 128, respectively, for the last three 
types of unit relative to the first. ; 

9.2 Relative precision of the household is 211 per cent for the sex ratio and 38 
per cent for the proportion who had seen a doctor. 

9.3 Relative precision of the large unit is 0.566 with sim 
and 0.625 with stratified random sampling. 

9.5 (i) If the standard deviation among large units in class ha Mna. (ii) If 
probability œ V Mp. 


101 2, 
10.2 Loss in precision is about 8 per cent. 


ple random sampling 


11.2 Contributions to variance from 
Within Between 
Methods units units Bias Total 
la 0.356 0.250 0.010 0.616 
Il 0.468 0.610 suss 1.108 
III 0.404 0.240 Miisa 0.644 
IV 0.386 A 0.800 
vV 0.350 0.248 0.002 0.600 


11.3 Total variance: 0.00482 (Ia), 0.02337 (II), 0.00554 mI). 
11.5 Exact variance is 


-N $fw-» sè 
Yn) = nM? 2 [S (Y: — F} + MAM; — m;) =| 


The variance formula deduced from theorem 11.2 is the same except that the factor 
(N — n)/(N — 1) inside the bracket is replaced by 1. 


11.6 To compute v9), use formula (11.29) in theorem 11.4, with z; = 1/N. 
This gives 
N 


n(n — 1)M? 


where N = 2823, M = 50,000, and © is the sum of squares of deviations of the 
quantities M7; which appear in table 11.7. The SE is 2.45 years. 

12.2 n’ > lôn. 

12.3 By formula 12.29, SE = 1.25, 


12.4 Per cent gains from the seco) 
and 105, respectively. r 


NOTE. Owing to differences in 
differ slightly from the above in the 


vu) = 


nd to the sixth occasion are 50, 75, 91, 100, 


rounding procedures, re 


aders’ answers may 
last significant figure. 
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illustration of precision, 133 
comparison with separate ratio esti- 
mate, 132 
optimum allocation with, 135 
Combined regression estimate, 153 
variance, 153, 155 
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Combined regression estimate, estimated 
variance, 157 
bias, 156 i 
comparison with separate regression 
estimate, 157 
Complete census compared with sample 
survey, 2 
Compromise allocation of sample sizes 
to strata, 85 
Conditional distribution. 
of proportions, 45 
worked example for proportions, 46 
of regression estimate, 145, 276 
Confidence limits 
in simple random sampling, 20 
for proportions or percentages, 39, 44 
in stratified random sampling, 73 
for ratio estimates, 120 
for optimum size of subsample, 228 
validity of normal approximation, 
22, 41 
effect of bias on, 8 
effect of non-response on, 294 
conditional, 45 
Consistency ‘ 
definition, 13 
of mean of simple random sample, 13 
of ratio estimate, 114 
of regression estimate, 141 
Correction for continuity, 40 
Correlation coefficient 
in finite populations, 116, 143 
intra-cluster, 164, 202 
within a systematic sample, 164, 165 
Correlogram, 174-176 
Cost functions 
in determining sample size, 61 
in stratified random sampling, 75 
in analytical studies, 107 
in determining optimum size of unit, 
200 
in determining optimum subsampling 
fraction, 225 
in determining optimum probabilities 
of selecting primary units, 253 
in determining optimum sampling 
fraction for non-respondents, 298 
in double sampling for stratification, 
272 
in double sampling for regression esti- 
mates, 278 
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Cost functions, with interpenetrating j 
subsamples, 314 } 

Covariance of sample means 17 

Cumulant, x4, 28 f 


Degrees of freedom, 20, 57 
effective number in stratified sam- 
pling, 73 
Domains of study, 107 
Double sampling, 103 
description, 268 
application to the problem of non- 
response, 298 
Double sampling with ratio estimates, 
281 
variance, 281 | 
estimated variance, 282 
Double sampling with regression esti- 
mates, 275 | 
variance, 278 7 
estimated variance, 280 
comparison with simple random sam- 
pling, 278 
optimum sample sizes, 278 
Double sampling with stratification, 269 
variance, 270 
estimated variance, 273 
comparison with simple random sam- 
pling, 272 
optimum sample sizes, 271 


E, average over all possible samples, 
14 


Elements, 34, 125 
End corrections, 172 
Error, limits of, 51, 53, 55 
Errors in surveys, types of, 292 
Errors of measurement, 304 
mathematical model for, 305 
effects of constant bias, 306 
effects of errors that are independent 
from unit to unit, 308 
effects of correlation betw 
on different, units, 311 


use of interpenetrating subsamples, 
312 


summary of effects, 316 
Estimates of population variances 
for determining sample size, 56 


for allocation in stratified sampli 
81, 82 pia 


een errors 


INDEX 


Estimates of population variances, for 
comparing different types of unit, 
195 
for determining subsampling fraction 
in two-stage sampling, 226 
Expansion factor, 13 
Eye estimates, 141 


Field costs, effect on optimum size of 
unit, 202 
Finite population correction (fpe), 17 
rule for ignoring, 17 
effect on size of sample for specified 
limits of error, 54, 56, 88, 120 
in stratified random sampling, 69 
for ratio estimates, 115 
for regression estimates, 142 
in two-stage sampling, 222 
Frame, 4 


Geographic stratification, 96, 134 
Grid sample in two dimensions, 183 


Hypergeometric distribution, 37, 44 
confidence limits for, 39 
charts of (reference), 42 
worked example for, 38 


Incomplete stratification, 225 ` 
Inflation factor, 13 
Interpenetrating subsamples, 312 

variance, 313 

estimated variance, 314 

optimum size of subsample, 315 
Interviewer bias, 305 
Intra-cluster correlation, 164, 202 
Item, definition, 12 


Kurtosis, 28 
effect on sample variance, 28 


Latin squares, use in systematic sam- 
pling, 184 

Least squares estimates, 123, 140, 145, 
153 

Limits of error (tolerable), 52 

Linear regression estimate, see Regres- 
sion estimate 

Listing of primary units, 241, 253 

effect of listing cost on optimum prob- 

ability of selection, 256 

Loss due to errors of estimation, 61 
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Mail surveys, 293, 301 
Matching (in repeated sampling of same 
population), 282, 286, 290 

Maximum likelihood estimates, 111 
Mean 

of sample (g), 12 

of population (F), 12 
Measures of size, 205 
Multinomial distribution, 43, 207 
Multistage sampling, 229 

See also Subsampling 


Non-normality 
frequently encountered in sampling 
practice, 23 
effect on confidence limits, 22, 41 
effect on sample variance, 27 
effect of stratification on, 27 
Non-response, 4, 292 
illustration of, 293 
bias due to, 294 
effect on confidence limits, 294 
estimation of sample size with, 296 
optimum sampling fraction among 
non-respondents, 298 
“substitution” method, 302 
Politz and Simmons’ method, 303 
Normal distribution, 5, 20 
validity of use for sample estimates, 
22 
as approximation to binomial distri- 
bution, 41 
Notation 
for simple random sampling, 12 
for variances of estimates, 19 
for proportions, 31, 43 
for stratified sampling, 66 
for ratio estimates, 112 
for two-stage sampling, 215, 235 
for errors of measurement, 305 


Optimum allocation in stratified sam- 

pling 

fixed sample size, 73 

fixed total cost, 75 

determination from previous data, 81 

comparison with proportional alloca- 
tion, 76, 80, 84 

comparison with simple random sam- 
pling, 76 

with more than one item, 84 
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Optimum allocation in stratified sam- 
pling, requiring more than 100% 
sampling, 86 

in sampling for proportions, 91 
in analytical studies, 107 
with ratio estimates, 135 
with double sampling, 271 
effect of deviations from optimum, 
78 
effect of errors in strata variances sp, 
82 
Optimum per cent matched 
in sampling on two occasions, 286 
in sampling on more than two occa- 
sions, 290 
Optimum size of subsample 
primary units of equal size, 225 
primary units of unequal size, 253 
Overall sampling fraction, 240, 246 


Percentages, estimation of, see Propor- 
tions, estimation of 
Periodic variation, effect on systematic 
sampling, 174, 179 
Poisson distribution, 24, 57 
Politz and Simmons’ method for han- 
dling non-response, 303 
Population 
definition, 3 
with linear trend, 170 
with periodic variation, 174 
autocorrelated, 174 
two-dimensional, 183 
Precision 
vs. accuracy, 10 
specification of, 52 
relative, 76, 79 
Primary sampling units (primary units), 
215 
Probability proportional to estimated 
size 
in single-stage sampling, 206 
in two-stage sampling, 239 
Probability proportional to size (pps), 
206 
method of drawing sample, 206 
in single-stage sampling, 206-212 
comparison with ratio estimate, 210 
in stratified single-stage sampling, 212 
in two-stage sampling, 237, 243, 246, 
247, 248, 256, 262 


INDEX 


Probability proportional to size (pps), 
in stratified two-stage sampling, 
263, 265 

Probability sampling, definition and 
properties, 6 

Proportional allocation in stratified 
sampling, 67 

self-weighing sample obtained, 67 

rule for use of, 81 

variance, 69 

in sampling for proportions, 92 

comparison with simple random sam- 
pling, 76, 78 

comparison with optimum allocation, 
76, 80, 84, 92 

comparison with stratification after 
selection, 104 

estimation of gain in precision from, 
100 

Proportions, estimation of, 31 

in simple random sampling, 31 

more than two classes, 43-45 

in stratified random sampling, 90 

in cluster sampling, 34, 124, 203 

in two-stage sampling (units of equal 
size), 228 

in two-stage sampling (units of un- 
equal sizes), 248 

in double sampling, 271 

size of sample for, 50, 53 


effect of population P on precision, 
35 


effect of non-response, 294 
Purposive selection, 7 


Quadratic confidence limits for ratio 
estimate, 121 

Qualitative characteristics, see Propor- 
tions, estimation of 

Quota sampling, 105 


Raising factor, 13 

“Random point” 
197 

Random sampling, see Simple random 
sampling 

Rare items, sampling for, 36, 48 

Ratio estimate, 111 

variance, 114-117 


method of selection, 


estimated variance, 118 
bias, 117 


Ratio estimate, conditions under which 
bias is negligible, 118 
consistency, 114 
confidence limits, 120 
optimum conditions for, 123 
in estimating proportions, 124 
in stratified random sampling, 129 
optimum allocation for, 135 
in cluster sampling, 124, 203 
in two-stage sampling, 248 
sample size with, 120 
comparison with mean per unit, 122 
comparison with regression estimate, 
148 
as special case of regression estimate, 
149 
comparison with stratification, 134 
comparison with pps sampling, 210 
limiting distribution, 127 
effect of measurement bias on, 307 
See also Combined ratio estimate and 
Separate ratio estimate 
Regression coefficient 
least squares, 140 
in finite populations, 142 
variance, 145 
combined from different strata, 153 
Regression estimate, 140 
uses, 140 
large-sample variance, 142 
estimated variance, 144 
least squares theory, 144 
bias, 147 
with arbitrary value of b, 149 
with inefficient estimate of b, 149 
i comparison with ratio estimate, 148 
comparison with mean per unit, 148 
in stratified random sampling, 150- 
158 
in double sampling, 275 
in repeated sampling of the same 
population, 284-290 
effect of measurement bias on, 307 
3 See also Combined regression estimate 
and Separate regression estimate 
Relative net precision, 192 
Relative precision (RP) 
of stratified random and simple ran- 
dom sampling, 76 4 
of optimum and general allocation, 
79 
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Repeated sampling of the same popula- 
tion, 282 
types of estimate wanted, 283 
replacement policy, 283, 286, 290 
sampling on two occasions, 284 
sampling on more than two occasions, 
286 
Replacement of sample, see Repeated 
sampling of the same population 


Sampling fraction, 13 
overall sampling fraction, 240, 246 
Sampling on more than two occasions, 
286 
estimate of current population mean, 
287 
optimum per cent matched, 290 
Sampling on two occasions, 284 
estimate of current population mean, 
284 
optimum per cent matched, 285 
Sampling ratio, 13 
Sampling unit (unit) 
definition, 3 
optimum measure of size of, 205 
Sampling unit, optimum, 189 
method for determining, 189-195 
worked examples of method, 189, 194, 
197 
use of survey data in determining, 195 
use of variance functions, 198 
effect of field costs on, 202 
for proportions, 203 
Sampling with replacement, 12, 206, 
245, 250, 259, 262 
Sampling without replacement, 12 
Selection with arbitrary probabilities 
in single-stage sampling, 207 
in two-stage sampling, 239 
optimum probabilities of selection of 
primary units, 253 
Self-weighting sample, 67 
Separate ratio estimate, 129 
variance, 129 
liability to bias, 130 
comparison with combined ratio esti- 
mate, 132 
estimated variance, 132 
illustration of precision, 133, 186 
optimum allocation for, 135 
Separate regression estimate, 150 
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Separate regression estimate, variance, 
151, 152 
liability to bias, 153 
comparison with combined regression 
estimate, 157 
estimated variance, 157 
Simple expansion, 122 
comparison with ratio estimate, 122 
Simple random sampling 
definition, 11 
method of drawing, 11 
unbiased sample mean, 14 
variance of sample mean, 15 
estimated variance of sample mean, 
19 
confidence limits for sample mean, 
20 
variance of sample proportion, 32, 35 
unbiased sample proportion, 32 
estimated variance of sample propor- 
tion, 33 
distribution of sample proportion, 36, 
37 
confidence limits for sample propor- 
tion, 39 
for classification into more than two 
classes, 43 
sample size needed, 50, 53, 55, 57 
precision compared with stratified 
random sampling, 76 
Size of sample for specified limits of 
error 
analysis of problem, 51 
with proportions, 50-54 
with continuous data, 55. 
with more than one item, 57 
in stratified random sampling, 87, 93 
with ratio estimates, 120 
worked examples, 50, 55, 56, 60, 89 
Stein’s method of two-stage sampling, 
` 59 
by minimizing cost plus loss due to 
errors, 61 
effect of non-response on, 296 
Size of sample needed 
for normal approximation to confi- 
dence limits for continuous data, 27 
for normal approximation to confi- 
dence limits of proportions, 41 
for estimating optimum subsampling 
fractions, 226 
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Skewed population, experimental sam- 
ples from, 22, 26 
Skewness 
coefficient of, 25 
effect on confidence limits, 22 
Square grid sample, 183 
Standard error 
of mean of simple random sample, 16, 
19, 55 
of estimated population total from 
simple random sample, 17, 19 
of sample standard deviation, 27 
of sample proportion, 32, 33, 34, 40, 
45, 54 
of total in population possessing some 
attribute, 33 
of weighted mean in stratified sam- 
pling, 69, 71, 72, 105 
of proportion estimated from strati- 
fied sample, 91 
of ratio estimate, 115-119, 125-127 
of ratio estimate in stratified sam- 
pling, 129, 131-134 
of regression estimate, 143-146, 150 
of mean of systematic sample, 163- 
167, 179-182 
of mean per element in two-stage 
sampling, 217, 222, 224, 236-243, 
249-253, 259-265 
in sampling with probability propor- 
tional to size, 208-210 
in double sampling, 270, 273, 278, 280, 
281, 282 
with interpenetrating subsamples, 314 
(Nore: in some cases the formula 
given is for the variance.) 
Stein’s method of two-sta 


ge sampling, 
a pling, 
Steps in a sample survey, 2 

Strata 


definition, 65 

construction, 93 

optimum number, 94 

effect of subdivision on precision, 93 


optimum boundaries between, 96 
Stratification, 65 


reasons for, 65 

best variable for, 93 
geographic, 96, 134 

after selection of Sample, 104 
with double Sampling, 268 
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Stratification, effect ‘on normality of 


variate, 27 
See also Strata 


Stratified random sampling, 65 


estimate, J, 66 

variance of fas, 69 

estimated variance of jst, 72 

confidence limits for continuous data, 
73 

optimum allocation, 73 

optimum allocation with varying costs, 
75 

size of sample, 87 

for proportions, 90 

construction of strata, 93 

estimation of gain in precision from 
stratification, 97-102 

with ratio estimates, 129 

with regression estimates, 150 

in analytical studies, 106 

with one unit per stratum, 105 

comparison with simple random sam- 
pling, 76 

comparison with systematic sampling, 
167 

effects of errors in stratum weights, 
102, 268, 271 

deliberate omission of a stratum, 294 


Stratified sampling, 65 


estimate, 66 
variance of estimate, 68 


Stratum weight, Wa, 69 
Subsampling (units of equal size), 215 


advantage, 215 

notation, 215 

approximate variance of mean, 217 

exact variance of mean, 220 

estimated variances, 218, 223 

prediction of variance for other sub- 
sampling fractions, 218 

optimum subsampling fractions, 225 

for proportions, 228 

in stratified sampling, 231 


Subsampling (units of unequal size), 


units chosen with probability pro- 
portional to estimated size, 239, 
243, 246, 247 

estimation of proportions, 248 

general formulas for variances, 249 

general formulas for estimated vari- 
ances, 259, 262 

optimum probabilities of selection, 
253 

advantages of ratio estimates, 248, 
250 

comparison of biased and unbiased 
estimates, 257 

in stratified sampling, 262 

planning of sample, 266 


Subsampling of non-respondents, 298 


“Substitution” method for non-response, 
302 


Super-population, 169 
Systematic sampling, 160 


advantages, 160 

variance, 162 

estimation of the variance, 179 

worked example for, 165 

récommendations about use, 185 

relation to cluster sampling, 162 

comparison with simple random sam- 
pling, 163, 167, 168, 170 

comparison with stratified sampling, 
165, 167, 170, 176 

end corrections, 172 

in populations in “random” order, 
168, 180 

in populations with linear trend, 170, 
180 

effect of periodic variation, 174, 179 

in autocorrelated populations, 174 

in natural populations, 176 

in two dimensions, 183 

in subsampling, 225 

stratified systematic sampling, 182 


t-distribution, 20, 27, 57, 73 


three-stage sampling, 229 
Theory of sampling, function in sample 


when subsample is systematic, 225 
Subsampling (units of unequal size), 234 surveys, 5 : 
notation, 235 Three-stage sampling, 229 
units chosen with equal probabilities, | Total in population, estimation 
235,°236, 243, 244, 245, 247 by simple expansion, 13 
units chosen with probability propor- by ratio estimate, 112 
tional to size, 237, 243, 245, 247 in stratified random sampling, 70 


330 INDEX 


Total in population, estimation, for at- 
tributes, 31 
Two-dimensional population, 183 
square grid sample, 183 
unaligned systematic sample, 183 
simple random sample, 184 
use of latin square principle, 184 
Two-phase sampling, see Double sam- 
pling 
Two-stage sampling, definition, 215 
See also Subsampling 


Unaligned systematic sample, 183 
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Unbiased procedure or estimate, defini- 
tion, 7,14 | 
Unit (sampling unit), definition, 3 
See also Sampling unit 
Unrestricted random sampling, 11 
See also Simple random sampling 


Variance, definition of S? and o°, 15 

Variances of sample estimates, see Stand- 
ard error 

Variance within units, as function of 
size of unit, 198 


ue 


‘Form No. 3. 
PSY, RES.L-1 


Bureau of Educational & Psychological 
Research Library. 
ee EE 

The book is to be returned within 
the date stamped last. 


WBGP-59/60-5119C-5M 


Author, 


Tete sees, 
ISH 


BO nina) 
DOD 


i ‘ 


DS 
=~ 


